2025-12-04T08:55:53.4835536Z Current runner version: '2.330.0'
2025-12-04T08:55:53.4841316Z Runner name: 'i-0e5520d20214059b0'
2025-12-04T08:55:53.4842149Z Runner group name: 'Default'
2025-12-04T08:55:53.4843083Z Machine name: 'ip-10-1-34-86'
2025-12-04T08:55:53.4845990Z ##[group]GITHUB_TOKEN Permissions
2025-12-04T08:55:53.4847981Z Contents: read
2025-12-04T08:55:53.4848480Z Metadata: read
2025-12-04T08:55:53.4848934Z ##[endgroup]
2025-12-04T08:55:53.4850903Z Secret source: Actions
2025-12-04T08:55:53.4851605Z Prepare workflow directory
2025-12-04T08:55:53.5329470Z Prepare all required actions
2025-12-04T08:55:53.5363211Z Getting action download info
2025-12-04T08:55:53.9157077Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd)
2025-12-04T08:55:56.3553774Z Download action repository 'pytorch/pytorch@main' (SHA:eabb7ad2128580ef674446027b95bcf4e21e8df3)
2025-12-04T08:56:12.9315896Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065)
2025-12-04T08:56:13.2910361Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722)
2025-12-04T08:56:13.5227300Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076)
2025-12-04T08:56:13.6948496Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T08:56:13.9285517Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T08:56:14.2146240Z Getting action download info
2025-12-04T08:56:14.3381694Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5)
2025-12-04T08:56:14.6645041Z Getting action download info
2025-12-04T08:56:14.7961093Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e)
2025-12-04T08:56:15.0462376Z Getting action download info
2025-12-04T08:56:15.1743336Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482)
2025-12-04T08:56:15.3900124Z Getting action download info
2025-12-04T08:56:15.5658901Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32)
2025-12-04T08:56:15.5662879Z ##[group] Inputs
2025-12-04T08:56:15.5663177Z   build-environment: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T08:56:15.5670343Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T08:56:15.5677831Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:56:15.5678443Z   sync-tag: 
2025-12-04T08:56:15.5679121Z   timeout-minutes: 360
2025-12-04T08:56:15.5679321Z   use-gha: 
2025-12-04T08:56:15.5679481Z   dashboard-tag: 
2025-12-04T08:56:15.5679669Z   s3-bucket: gha-artifacts
2025-12-04T08:56:15.5679866Z   aws-role-to-assume: 
2025-12-04T08:56:15.5680469Z   disable-monitor: false
2025-12-04T08:56:15.5680726Z   monitor-log-interval: 5
2025-12-04T08:56:15.5681025Z   monitor-data-collect-interval: 1
2025-12-04T08:56:15.5681386Z ##[endgroup]
2025-12-04T08:56:15.5682000Z Complete job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T08:56:15.6278825Z A job started hook has been configured by the self-hosted runner administrator
2025-12-04T08:56:15.6374481Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh'
2025-12-04T08:56:15.6384328Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:56:15.6384888Z ##[endgroup]
2025-12-04T08:56:16.9129657Z Runner Type: lf.linux.g6.4xlarge.experimental.nvidia.gpu
2025-12-04T08:56:16.9130138Z Instance Type: g6.4xlarge
2025-12-04T08:56:16.9130346Z AMI Name: unknown
2025-12-04T08:56:16.9172640Z AMI ID: ami-08982f1c5bf93d976
2025-12-04T08:56:21.6666630Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main
2025-12-04T08:56:21.6666997Z with:
2025-12-04T08:56:21.6667479Z   github-secret: ***
2025-12-04T08:56:21.6668009Z   instructions: All testing is done inside the container, to start an interactive session run:
  docker exec -it $(docker container ps --format '{{.ID}}') bash

2025-12-04T08:56:21.6668551Z   activate-with-label: false
2025-12-04T08:56:21.6668753Z   label: with-ssh
2025-12-04T08:56:21.6668932Z   remove-existing-keys: true
2025-12-04T08:56:21.6669134Z   fail-silently: true
2025-12-04T08:56:21.6669341Z env:
2025-12-04T08:56:21.6669491Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:21.6669681Z ##[endgroup]
2025-12-04T08:56:21.7719980Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info.
2025-12-04T08:56:21.7721864Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys
2025-12-04T08:56:21.7849926Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main
2025-12-04T08:56:21.7850463Z with:
2025-12-04T08:56:21.7850642Z   no-sudo: true
2025-12-04T08:56:21.7850813Z   submodules: recursive
2025-12-04T08:56:21.7851011Z   fetch-depth: 0
2025-12-04T08:56:21.7851182Z env:
2025-12-04T08:56:21.7851328Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:21.7851516Z ##[endgroup]
2025-12-04T08:56:21.7914825Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T08:56:21.7915557Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T08:56:21.7927068Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:56:21.7927358Z env:
2025-12-04T08:56:21.7927548Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:21.7927775Z ##[endgroup]
2025-12-04T08:56:21.8003580Z ##[group]Run # Use all available CPUs for fetching
2025-12-04T08:56:21.8003924Z [36;1m# Use all available CPUs for fetching[0m
2025-12-04T08:56:21.8004184Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T08:56:21.8004432Z [36;1mgit config --global fetch.parallel 0[0m
2025-12-04T08:56:21.8004717Z [36;1mgit config --global submodule.fetchJobs 0[0m
2025-12-04T08:56:21.8026067Z [36;1m[0m
2025-12-04T08:56:21.8026565Z [36;1m# Clean workspace. The default checkout action should also do this, but[0m
2025-12-04T08:56:21.8027148Z [36;1m# do it here as well just in case[0m
2025-12-04T08:56:21.8027565Z [36;1mif [[ -d .git ]]; then[0m
2025-12-04T08:56:21.8027939Z [36;1m  if [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T08:56:21.8028396Z [36;1m    sudo git clean -ffdx[0m
2025-12-04T08:56:21.8028758Z [36;1m  else[0m
2025-12-04T08:56:21.8029041Z [36;1m    git clean -ffdx[0m
2025-12-04T08:56:21.8029375Z [36;1m  fi[0m
2025-12-04T08:56:21.8029654Z [36;1mfi[0m
2025-12-04T08:56:21.8039380Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:56:21.8039684Z env:
2025-12-04T08:56:21.8039999Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:21.8040219Z   NO_SUDO: true
2025-12-04T08:56:21.8040388Z ##[endgroup]
2025-12-04T08:56:21.8152953Z ##[group]Run actions/checkout@v4
2025-12-04T08:56:21.8153177Z with:
2025-12-04T08:56:21.8153363Z   ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:56:21.8153615Z   fetch-depth: 0
2025-12-04T08:56:21.8153788Z   submodules: recursive
2025-12-04T08:56:21.8153978Z   show-progress: false
2025-12-04T08:56:21.8154171Z   repository: pytorch/pytorch
2025-12-04T08:56:21.8154488Z   token: ***
2025-12-04T08:56:21.8154651Z   ssh-strict: true
2025-12-04T08:56:21.8154819Z   ssh-user: git
2025-12-04T08:56:21.8154993Z   persist-credentials: true
2025-12-04T08:56:21.8155182Z   clean: true
2025-12-04T08:56:21.8155370Z   sparse-checkout-cone-mode: true
2025-12-04T08:56:21.8155589Z   fetch-tags: false
2025-12-04T08:56:21.8155746Z   lfs: false
2025-12-04T08:56:21.8155909Z   set-safe-directory: true
2025-12-04T08:56:21.8156108Z env:
2025-12-04T08:56:21.8156270Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:56:21.8156452Z ##[endgroup]
2025-12-04T08:56:21.9177593Z Syncing repository: pytorch/pytorch
2025-12-04T08:56:21.9178801Z ##[group]Getting Git version info
2025-12-04T08:56:21.9179180Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T08:56:21.9179695Z [command]/usr/bin/git version
2025-12-04T08:56:21.9179904Z git version 2.50.1
2025-12-04T08:56:21.9193393Z ##[endgroup]
2025-12-04T08:56:21.9202503Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/f6dcb4ef-2c6d-454a-86d9-cc125074ac45/.gitconfig'
2025-12-04T08:56:21.9221809Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f6dcb4ef-2c6d-454a-86d9-cc125074ac45' before making global git config changes
2025-12-04T08:56:21.9222762Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T08:56:21.9226391Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T08:56:21.9262708Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T08:56:21.9265881Z ##[group]Initializing the repository
2025-12-04T08:56:21.9269427Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T08:56:21.9307336Z hint: Using 'master' as the name for the initial branch. This default branch name
2025-12-04T08:56:21.9308223Z hint: is subject to change. To configure the initial branch name to use in all
2025-12-04T08:56:21.9308827Z hint: of your new repositories, which will suppress this warning, call:
2025-12-04T08:56:21.9309455Z hint:
2025-12-04T08:56:21.9309877Z hint: 	git config --global init.defaultBranch <name>
2025-12-04T08:56:21.9310325Z hint:
2025-12-04T08:56:21.9310656Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
2025-12-04T08:56:21.9311194Z hint: 'development'. The just-created branch can be renamed via this command:
2025-12-04T08:56:21.9311591Z hint:
2025-12-04T08:56:21.9311815Z hint: 	git branch -m <name>
2025-12-04T08:56:21.9312056Z hint:
2025-12-04T08:56:21.9312374Z hint: Disable this message with "git config set advice.defaultBranchName false"
2025-12-04T08:56:21.9313026Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/
2025-12-04T08:56:21.9319180Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch
2025-12-04T08:56:21.9346237Z ##[endgroup]
2025-12-04T08:56:21.9346845Z ##[group]Disabling automatic garbage collection
2025-12-04T08:56:21.9349494Z [command]/usr/bin/git config --local gc.auto 0
2025-12-04T08:56:21.9374607Z ##[endgroup]
2025-12-04T08:56:21.9375169Z ##[group]Setting up auth
2025-12-04T08:56:21.9380239Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T08:56:21.9406203Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T08:56:21.9733894Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T08:56:21.9760564Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T08:56:22.0070229Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T08:56:22.0099187Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T08:56:22.0412325Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T08:56:22.0464275Z ##[endgroup]
2025-12-04T08:56:22.0464893Z ##[group]Fetching the repository
2025-12-04T08:56:22.0472484Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2025-12-04T08:57:05.5724597Z From https://github.com/pytorch/pytorch
2025-12-04T08:57:05.5725167Z  * [new branch]              2.6.0.dev20241004+          -> origin/2.6.0.dev20241004+
2025-12-04T08:57:05.5725963Z  * [new branch]              2.9.1                       -> origin/2.9.1
2025-12-04T08:57:05.5726555Z  * [new branch]              AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest
2025-12-04T08:57:05.5727208Z  * [new branch]              Flamefire-patch-1           -> origin/Flamefire-patch-1
2025-12-04T08:57:05.5729046Z  * [new branch]              HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes
2025-12-04T08:57:05.5730527Z  * [new branch]              HOPrintFunc                 -> origin/HOPrintFunc
2025-12-04T08:57:05.5733318Z  * [new branch]              IvanKobzarev/stack/1        -> origin/IvanKobzarev/stack/1
2025-12-04T08:57:05.5735923Z  * [new branch]              NicoshevSVE128              -> origin/NicoshevSVE128
2025-12-04T08:57:05.5737206Z  * [new branch]              PR-AOTInductorNoneBug       -> origin/PR-AOTInductorNoneBug
2025-12-04T08:57:05.5739281Z  * [new branch]              PR-AOTInductorNoneBugFix    -> origin/PR-AOTInductorNoneBugFix
2025-12-04T08:57:05.5740765Z  * [new branch]              PR-FixConfigsIssue          -> origin/PR-FixConfigsIssue
2025-12-04T08:57:05.5742474Z  * [new branch]              PR-NoneBugFix-viable        -> origin/PR-NoneBugFix-viable
2025-12-04T08:57:05.5744048Z  * [new branch]              PR-ResetToZero              -> origin/PR-ResetToZero
2025-12-04T08:57:05.5745865Z  * [new branch]              Update-Flash-Packaging      -> origin/Update-Flash-Packaging
2025-12-04T08:57:05.5747238Z  * [new branch]              VLA_exp                     -> origin/VLA_exp
2025-12-04T08:57:05.5749330Z  * [new branch]              activation_bench            -> origin/activation_bench
2025-12-04T08:57:05.5751404Z  * [new branch]              addmm-heuristic             -> origin/addmm-heuristic
2025-12-04T08:57:05.5753608Z  * [new branch]              adi/onednn_aarch64          -> origin/adi/onednn_aarch64
2025-12-04T08:57:05.5755275Z  * [new branch]              adi/test                    -> origin/adi/test
2025-12-04T08:57:05.5756860Z  * [new branch]              adi/test_bgemm              -> origin/adi/test_bgemm
2025-12-04T08:57:05.5758486Z  * [new branch]              adi/test_m8g                -> origin/adi/test_m8g
2025-12-04T08:57:05.5760273Z  * [new branch]              adi/test_onednn             -> origin/adi/test_onednn
2025-12-04T08:57:05.5761928Z  * [new branch]              adi/test_onednn_v3.9        -> origin/adi/test_onednn_v3.9
2025-12-04T08:57:05.5763566Z  * [new branch]              adi/test_presve_change      -> origin/adi/test_presve_change
2025-12-04T08:57:05.5765115Z  * [new branch]              adi/test_timm               -> origin/adi/test_timm
2025-12-04T08:57:05.5767084Z  * [new branch]              adi/testpresve_change       -> origin/adi/testpresve_change
2025-12-04T08:57:05.5769931Z  * [new branch]              aditew01/test/vec_bf16      -> origin/aditew01/test/vec_bf16
2025-12-04T08:57:05.5771599Z  * [new branch]              ah-globalfeedback-hook      -> origin/ah-globalfeedback-hook
2025-12-04T08:57:05.5773571Z  * [new branch]              albanD-patch-1              -> origin/albanD-patch-1
2025-12-04T08:57:05.5775121Z  * [new branch]              also-surround-shimh         -> origin/also-surround-shimh
2025-12-04T08:57:05.5777530Z  * [new branch]              angelayi/aot_compile        -> origin/angelayi/aot_compile
2025-12-04T08:57:05.5779065Z  * [new branch]              angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files
2025-12-04T08:57:05.5780653Z  * [new branch]              angelayi/benchmark          -> origin/angelayi/benchmark
2025-12-04T08:57:05.5782383Z  * [new branch]              angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization
2025-12-04T08:57:05.5783912Z  * [new branch]              angelayi/cpp_loader         -> origin/angelayi/cpp_loader
2025-12-04T08:57:05.5785528Z  * [new branch]              angelayi/inductor_const     -> origin/angelayi/inductor_const
2025-12-04T08:57:05.5787048Z  * [new branch]              angelayi/lstm               -> origin/angelayi/lstm
2025-12-04T08:57:05.5789113Z  * [new branch]              angelayi/no_so_weight       -> origin/angelayi/no_so_weight
2025-12-04T08:57:05.5791075Z  * [new branch]              angelayi/scan_layers        -> origin/angelayi/scan_layers
2025-12-04T08:57:05.5792759Z  * [new branch]              angelayi/side_eff           -> origin/angelayi/side_eff
2025-12-04T08:57:05.5794436Z  * [new branch]              angelayi/state_dict         -> origin/angelayi/state_dict
2025-12-04T08:57:05.5796192Z  * [new branch]              angelayi/symint_input       -> origin/angelayi/symint_input
2025-12-04T08:57:05.5798051Z  * [new branch]              angelayi/symm_mem           -> origin/angelayi/symm_mem
2025-12-04T08:57:05.5799694Z  * [new branch]              angelayi/test_cpp           -> origin/angelayi/test_cpp
2025-12-04T08:57:05.5801560Z  * [new branch]              angelayi/torch_size         -> origin/angelayi/torch_size
2025-12-04T08:57:05.5803143Z  * [new branch]              annotate_assert             -> origin/annotate_assert
2025-12-04T08:57:05.5804827Z  * [new branch]              annotate_fallback_kernel    -> origin/annotate_fallback_kernel
2025-12-04T08:57:05.5806532Z  * [new branch]              annotation_deepcopy         -> origin/annotation_deepcopy
2025-12-04T08:57:05.5808163Z  * [new branch]              annotation_dynamo           -> origin/annotation_dynamo
2025-12-04T08:57:05.5809791Z  * [new branch]              aot_eager_stack_trace       -> origin/aot_eager_stack_trace
2025-12-04T08:57:05.5811425Z  * [new branch]              aoti-cuda-alloc             -> origin/aoti-cuda-alloc
2025-12-04T08:57:05.5813076Z  * [new branch]              aoti_const_device           -> origin/aoti_const_device
2025-12-04T08:57:05.5814748Z  * [new branch]              aoti_fqn_name_interface     -> origin/aoti_fqn_name_interface
2025-12-04T08:57:05.5816390Z  * [new branch]              aoti_package_weights_binary -> origin/aoti_package_weights_binary
2025-12-04T08:57:05.5818256Z  * [new branch]              aoti_target_windows         -> origin/aoti_target_windows
2025-12-04T08:57:05.5821002Z  * [new branch]              arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling
2025-12-04T08:57:05.5822598Z  * [new branch]              async_tp                    -> origin/async_tp
2025-12-04T08:57:05.5824365Z  * [new branch]              atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124
2025-12-04T08:57:05.5826074Z  * [new branch]              atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1
2025-12-04T08:57:05.5827783Z  * [new branch]              atalman-patch-2             -> origin/atalman-patch-2
2025-12-04T08:57:05.5829577Z  * [new branch]              atalman-patch-3             -> origin/atalman-patch-3
2025-12-04T08:57:05.5831273Z  * [new branch]              atalman-patch-4             -> origin/atalman-patch-4
2025-12-04T08:57:05.5833030Z  * [new branch]              atalman-patch-5             -> origin/atalman-patch-5
2025-12-04T08:57:05.5834741Z  * [new branch]              atalman-patch-6             -> origin/atalman-patch-6
2025-12-04T08:57:05.5836431Z  * [new branch]              atalman-patch-7             -> origin/atalman-patch-7
2025-12-04T08:57:05.5838159Z  * [new branch]              atalman-patch-8             -> origin/atalman-patch-8
2025-12-04T08:57:05.5839847Z  * [new branch]              atalman_inductor_2.3.1      -> origin/atalman_inductor_2.3.1
2025-12-04T08:57:05.5841689Z  * [new branch]              atalman_inductor_2.4.0      -> origin/atalman_inductor_2.4.0
2025-12-04T08:57:05.5843399Z  * [new branch]              atalman_inductor_2.4.x      -> origin/atalman_inductor_2.4.x
2025-12-04T08:57:05.5845173Z  * [new branch]              attention_benchmarking_clean -> origin/attention_benchmarking_clean
2025-12-04T08:57:05.5847346Z  * [new branch]              bahuang/dt_fix_scalar_add   -> origin/bahuang/dt_fix_scalar_add
2025-12-04T08:57:05.5848903Z  * [new branch]              bahuang/fix_debug_mode      -> origin/bahuang/fix_debug_mode
2025-12-04T08:57:05.5850582Z  * [new branch]              bahuang/fix_expand          -> origin/bahuang/fix_expand
2025-12-04T08:57:05.5852192Z  * [new branch]              bahuang/test                -> origin/bahuang/test
2025-12-04T08:57:05.5854813Z  * [new branch]              base/1.5                    -> origin/base/1.5
2025-12-04T08:57:05.5856822Z  * [new branch]              batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention
2025-12-04T08:57:05.5858398Z  * [new branch]              bench_scaled_mm_ops         -> origin/bench_scaled_mm_ops
2025-12-04T08:57:05.5860197Z  * [new branch]              benchmark-updates           -> origin/benchmark-updates
2025-12-04T08:57:05.5861761Z  * [new branch]              benchmarking-script         -> origin/benchmarking-script
2025-12-04T08:57:05.5863972Z  * [new branch]              bertmaher/pinbump26         -> origin/bertmaher/pinbump26
2025-12-04T08:57:05.5866205Z  * [new branch]              bertrand/cutlass            -> origin/bertrand/cutlass
2025-12-04T08:57:05.5868397Z  * [new branch]              bf/bug-static-input         -> origin/bf/bug-static-input
2025-12-04T08:57:05.5869988Z  * [new branch]              bf/cg-backend               -> origin/bf/cg-backend
2025-12-04T08:57:05.5871529Z  * [new branch]              bf/cg-nccl-test             -> origin/bf/cg-nccl-test
2025-12-04T08:57:05.5873146Z  * [new branch]              bf/cg-remove-check          -> origin/bf/cg-remove-check
2025-12-04T08:57:05.5874963Z  * [new branch]              bf/clean-torchbench-hf      -> origin/bf/clean-torchbench-hf
2025-12-04T08:57:05.5876675Z  * [new branch]              bf/combo-debug-log          -> origin/bf/combo-debug-log
2025-12-04T08:57:05.5878260Z  * [new branch]              bf/cudagraph                -> origin/bf/cudagraph
2025-12-04T08:57:05.5880426Z  * [new branch]              bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation
2025-12-04T08:57:05.5882446Z  * [new branch]              bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark
2025-12-04T08:57:05.5883713Z  * [new branch]              bf/cudagraph-partition      -> origin/bf/cudagraph-partition
2025-12-04T08:57:05.5885563Z  * [new branch]              bf/donated-buffer-bench     -> origin/bf/donated-buffer-bench
2025-12-04T08:57:05.5887239Z  * [new branch]              bf/dynamo-partition         -> origin/bf/dynamo-partition
2025-12-04T08:57:05.5888870Z  * [new branch]              bf/lite                     -> origin/bf/lite
2025-12-04T08:57:05.5890524Z  * [new branch]              bf/pa-non-divisible         -> origin/bf/pa-non-divisible
2025-12-04T08:57:05.5892255Z  * [new branch]              bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols
2025-12-04T08:57:05.5893988Z  * [new branch]              bf/partition-memory-plan    -> origin/bf/partition-memory-plan
2025-12-04T08:57:05.5895660Z  * [new branch]              bf/partition-move-cpu       -> origin/bf/partition-move-cpu
2025-12-04T08:57:05.5897373Z  * [new branch]              bf/partition-view-fallback  -> origin/bf/partition-view-fallback
2025-12-04T08:57:05.5899005Z  * [new branch]              bf/remove-check-55b0c39d    -> origin/bf/remove-check-55b0c39d
2025-12-04T08:57:05.5900653Z  * [new branch]              bf/timm-nov-26-2025         -> origin/bf/timm-nov-26-2025
2025-12-04T08:57:05.5902382Z  * [new branch]              bf/transformer-pin-4-57-3   -> origin/bf/transformer-pin-4-57-3
2025-12-04T08:57:05.5904028Z  * [new branch]              bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492
2025-12-04T08:57:05.5905641Z  * [new branch]              bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb
2025-12-04T08:57:05.5907255Z  * [new branch]              bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129
2025-12-04T08:57:05.5908831Z  * [new branch]              bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d
2025-12-04T08:57:05.5910471Z  * [new branch]              bisect_perf_hf_T5_5268754e  -> origin/bisect_perf_hf_T5_5268754e
2025-12-04T08:57:05.5912036Z  * [new branch]              bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c
2025-12-04T08:57:05.5913646Z  * [new branch]              bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c
2025-12-04T08:57:05.5915182Z  * [new branch]              bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f
2025-12-04T08:57:05.5916845Z  * [new branch]              bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0
2025-12-04T08:57:05.5919006Z  * [new branch]              bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149
2025-12-04T08:57:05.5920389Z  * [new branch]              bisect_perf_hf_T5_d65f194a  -> origin/bisect_perf_hf_T5_d65f194a
2025-12-04T08:57:05.5922184Z  * [new branch]              bisect_perf_hf_T5_da94ab0b  -> origin/bisect_perf_hf_T5_da94ab0b
2025-12-04T08:57:05.5923761Z  * [new branch]              bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new
2025-12-04T08:57:05.5925395Z  * [new branch]              bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8
2025-12-04T08:57:05.5926961Z  * [new branch]              bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2
2025-12-04T08:57:05.5928535Z  * [new branch]              bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563
2025-12-04T08:57:05.5930979Z  * [new branch]              brister/fx_device_type      -> origin/brister/fx_device_type
2025-12-04T08:57:05.5932584Z  * [new branch]              brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx
2025-12-04T08:57:05.5934219Z  * [new branch]              brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check
2025-12-04T08:57:05.5935811Z  * [new branch]              bwd-backup                  -> origin/bwd-backup
2025-12-04T08:57:05.5937598Z  * [new branch]              c57382a49                   -> origin/c57382a49
2025-12-04T08:57:05.5939220Z  * [new branch]              ca_0431d47eaa               -> origin/ca_0431d47eaa
2025-12-04T08:57:05.5940852Z  * [new branch]              ca_fix_0431d47eaa           -> origin/ca_fix_0431d47eaa
2025-12-04T08:57:05.5943152Z  * [new branch]              camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push
2025-12-04T08:57:05.5944894Z  * [new branch]              cccclai-patch-1             -> origin/cccclai-patch-1
2025-12-04T08:57:05.5946683Z  * [new branch]              cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_
2025-12-04T08:57:05.5948367Z  * [new branch]              cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_
2025-12-04T08:57:05.5950086Z  * [new branch]              cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_
2025-12-04T08:57:05.5951714Z  * [new branch]              cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_
2025-12-04T08:57:05.5953383Z  * [new branch]              cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_
2025-12-04T08:57:05.5955183Z  * [new branch]              cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_
2025-12-04T08:57:05.5956879Z  * [new branch]              cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_
2025-12-04T08:57:05.5958550Z  * [new branch]              cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_
2025-12-04T08:57:05.5960262Z  * [new branch]              cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_
2025-12-04T08:57:05.5962043Z  * [new branch]              cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_
2025-12-04T08:57:05.5963729Z  * [new branch]              cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_
2025-12-04T08:57:05.5965375Z  * [new branch]              cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_
2025-12-04T08:57:05.5967097Z  * [new branch]              cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_
2025-12-04T08:57:05.5968786Z  * [new branch]              cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_
2025-12-04T08:57:05.5970592Z  * [new branch]              cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_
2025-12-04T08:57:05.5972004Z  * [new branch]              cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_
2025-12-04T08:57:05.5973832Z  * [new branch]              cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_
2025-12-04T08:57:05.5975472Z  * [new branch]              cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_
2025-12-04T08:57:05.5977255Z  * [new branch]              cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_
2025-12-04T08:57:05.5978904Z  * [new branch]              cherry_pick_166036_166040   -> origin/cherry_pick_166036_166040
2025-12-04T08:57:05.5980586Z  * [new branch]              cherry_pick_166457          -> origin/cherry_pick_166457
2025-12-04T08:57:05.5982289Z  * [new branch]              cherrypick_166338           -> origin/cherrypick_166338
2025-12-04T08:57:05.5983960Z  * [new branch]              cherrypick_166458           -> origin/cherrypick_166458
2025-12-04T08:57:05.5985623Z  * [new branch]              cherrypick_166586           -> origin/cherrypick_166586
2025-12-04T08:57:05.5987304Z  * [new branch]              cherrypick_166956           -> origin/cherrypick_166956
2025-12-04T08:57:05.5988962Z  * [new branch]              ci_attn                     -> origin/ci_attn
2025-12-04T08:57:05.5990624Z  * [new branch]              codex-testing               -> origin/codex-testing
2025-12-04T08:57:05.5993071Z  * [new branch]              codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions
2025-12-04T08:57:05.5994513Z  * [new branch]              codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch
2025-12-04T08:57:05.5996636Z  * [new branch]              codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id
2025-12-04T08:57:05.5998601Z  * [new branch]              codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run
2025-12-04T08:57:05.6000104Z  * [new branch]              compatiblpy39util           -> origin/compatiblpy39util
2025-12-04T08:57:05.6001832Z  * [new branch]              cond_hop_device             -> origin/cond_hop_device
2025-12-04T08:57:05.6003464Z  * [new branch]              context_test                -> origin/context_test
2025-12-04T08:57:05.6005828Z  * [new branch]              copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip
2025-12-04T08:57:05.6007922Z  * [new branch]              cpio/fix_new_ami_tests      -> origin/cpio/fix_new_ami_tests
2025-12-04T08:57:05.6009611Z  * [new branch]              cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade
2025-12-04T08:57:05.6011856Z  * [new branch]              csl/always_produce_xml      -> origin/csl/always_produce_xml
2025-12-04T08:57:05.6013365Z  * [new branch]              csl/build_test_more_procs   -> origin/csl/build_test_more_procs
2025-12-04T08:57:05.6014918Z  * [new branch]              csl/build_test_more_procs2  -> origin/csl/build_test_more_procs2
2025-12-04T08:57:05.6016486Z  * [new branch]              csl/clean_up                -> origin/csl/clean_up
2025-12-04T08:57:05.6018329Z  * [new branch]              csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit
2025-12-04T08:57:05.6019865Z  * [new branch]              csl/katex                   -> origin/csl/katex
2025-12-04T08:57:05.6021686Z  * [new branch]              csl/larger_runner           -> origin/csl/larger_runner
2025-12-04T08:57:05.6023680Z  * [new branch]              csl/lint_testing            -> origin/csl/lint_testing
2025-12-04T08:57:05.6025611Z  * [new branch]              csl/lint_thing              -> origin/csl/lint_thing
2025-12-04T08:57:05.6027268Z  * [new branch]              csl/lintrunner_stuff        -> origin/csl/lintrunner_stuff
2025-12-04T08:57:05.6029021Z  * [new branch]              csl/manually_gen_json       -> origin/csl/manually_gen_json
2025-12-04T08:57:05.6030650Z  * [new branch]              csl/mps_sharding            -> origin/csl/mps_sharding
2025-12-04T08:57:05.6032280Z  * [new branch]              csl/multistage_docker       -> origin/csl/multistage_docker
2025-12-04T08:57:05.6033916Z  * [new branch]              csl/print_timing            -> origin/csl/print_timing
2025-12-04T08:57:05.6035672Z  * [new branch]              csl/remove_experiment       -> origin/csl/remove_experiment
2025-12-04T08:57:05.6037336Z  * [new branch]              csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var
2025-12-04T08:57:05.6039041Z  * [new branch]              csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel
2025-12-04T08:57:05.6040829Z  * [new branch]              csl/remove_run_parallel     -> origin/csl/remove_run_parallel
2025-12-04T08:57:05.6042448Z  * [new branch]              csl/remove_unused_vars      -> origin/csl/remove_unused_vars
2025-12-04T08:57:05.6044042Z  * [new branch]              csl/revert_open             -> origin/csl/revert_open
2025-12-04T08:57:05.6045656Z  * [new branch]              csl/skip_build              -> origin/csl/skip_build
2025-12-04T08:57:05.6047315Z  * [new branch]              csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs
2025-12-04T08:57:05.6048887Z  * [new branch]              csl/td_job_level            -> origin/csl/td_job_level
2025-12-04T08:57:05.6050563Z  * [new branch]              csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner
2025-12-04T08:57:05.6052299Z  * [new branch]              csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn
2025-12-04T08:57:05.6053901Z  * [new branch]              csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence
2025-12-04T08:57:05.6055864Z  * [new branch]              csl/upload_json_running     -> origin/csl/upload_json_running
2025-12-04T08:57:05.6057903Z  * [new branch]              csl/win_sccache             -> origin/csl/win_sccache
2025-12-04T08:57:05.6059274Z  * [new branch]              csl/xml_stuff               -> origin/csl/xml_stuff
2025-12-04T08:57:05.6061058Z  * [new branch]              cublasrelax2                -> origin/cublasrelax2
2025-12-04T08:57:05.6062669Z  * [new branch]              cuda_mempool                -> origin/cuda_mempool
2025-12-04T08:57:05.6064359Z  * [new branch]              custom_lowering_dict        -> origin/custom_lowering_dict
2025-12-04T08:57:05.6066548Z  * [new branch]              d4l3k/debug_plane_frtrace   -> origin/d4l3k/debug_plane_frtrace
2025-12-04T08:57:05.6068834Z  * [new branch]              daxia6/2.8o3                -> origin/daxia6/2.8o3
2025-12-04T08:57:05.6070374Z  * [new branch]              debug-guard                 -> origin/debug-guard
2025-12-04T08:57:05.6072096Z  * [new branch]              delete-quant-docs           -> origin/delete-quant-docs
2025-12-04T08:57:05.6076974Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0
2025-12-04T08:57:05.6078684Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1
2025-12-04T08:57:05.6080830Z  * [new branch]              desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper
2025-12-04T08:57:05.6082535Z  * [new branch]              desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64
2025-12-04T08:57:05.6085038Z  * [new branch]              dev/dhruva/flex_attn_opt    -> origin/dev/dhruva/flex_attn_opt
2025-12-04T08:57:05.6087355Z  * [new branch]              dev/joona/MPSNDArrayAdd     -> origin/dev/joona/MPSNDArrayAdd
2025-12-04T08:57:05.6089083Z  * [new branch]              dev/joona/Unranked          -> origin/dev/joona/Unranked
2025-12-04T08:57:05.6091402Z  * [new branch]              dev/joona/cat               -> origin/dev/joona/cat
2025-12-04T08:57:05.6092589Z  * [new branch]              dev/joona/embeddingbag      -> origin/dev/joona/embeddingbag
2025-12-04T08:57:05.6094447Z  * [new branch]              dev/joona/fix_sdpa_memtest  -> origin/dev/joona/fix_sdpa_memtest
2025-12-04T08:57:05.6096204Z  * [new branch]              dev/joona/getTensorsString  -> origin/dev/joona/getTensorsString
2025-12-04T08:57:05.6098016Z  * [new branch]              dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14
2025-12-04T08:57:05.6099965Z  * [new branch]              dev/joona/scalar_clamp      -> origin/dev/joona/scalar_clamp
2025-12-04T08:57:05.6102080Z  * [new branch]              dev/joona/sdpa              -> origin/dev/joona/sdpa
2025-12-04T08:57:05.6104327Z  * [new branch]              dev/joona/sdpa_api          -> origin/dev/joona/sdpa_api
2025-12-04T08:57:05.6106154Z  * [new branch]              dev/joona/type_inf          -> origin/dev/joona/type_inf
2025-12-04T08:57:05.6107999Z  * [new branch]              dev/joona/ulpAssertClose    -> origin/dev/joona/ulpAssertClose
2025-12-04T08:57:05.6109811Z  * [new branch]              dev/joona/upsize3d          -> origin/dev/joona/upsize3d
2025-12-04T08:57:05.6111405Z  * [new branch]              disp_counter                -> origin/disp_counter
2025-12-04T08:57:05.6113111Z  * [new branch]              divyanshk-patch-1           -> origin/divyanshk-patch-1
2025-12-04T08:57:05.6114716Z  * [new branch]              docs                        -> origin/docs
2025-12-04T08:57:05.6116514Z  * [new branch]              documentation               -> origin/documentation
2025-12-04T08:57:05.6118381Z  * [new branch]              eager_model_benchmarks      -> origin/eager_model_benchmarks
2025-12-04T08:57:05.6120706Z  * [new branch]              embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control
2025-12-04T08:57:05.6122259Z  * [new branch]              embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B
2025-12-04T08:57:05.6123610Z  * [new branch]              embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B
2025-12-04T08:57:05.6125350Z  * [new branch]              eqy-patch-1                 -> origin/eqy-patch-1
2025-12-04T08:57:05.6127081Z  * [new branch]              eqy-patch-2                 -> origin/eqy-patch-2
2025-12-04T08:57:05.6128779Z  * [new branch]              eqy-patch-3                 -> origin/eqy-patch-3
2025-12-04T08:57:05.6130419Z  * [new branch]              eqy-patch-4                 -> origin/eqy-patch-4
2025-12-04T08:57:05.6132094Z  * [new branch]              eqy-patch-5                 -> origin/eqy-patch-5
2025-12-04T08:57:05.6133697Z  * [new branch]              eqy-patch-6                 -> origin/eqy-patch-6
2025-12-04T08:57:05.6135887Z  * [new branch]              exclamaforte/amd-ma         -> origin/exclamaforte/amd-ma
2025-12-04T08:57:05.6137544Z  * [new branch]              exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run
2025-12-04T08:57:05.6138867Z  * [new branch]              exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor
2025-12-04T08:57:05.6140613Z  * [new branch]              exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion
2025-12-04T08:57:05.6142287Z  * [new branch]              exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning
2025-12-04T08:57:05.6144169Z  * [new branch]              exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg
2025-12-04T08:57:05.6146215Z  * [new branch]              exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run
2025-12-04T08:57:05.6147775Z  * [new branch]              exclamaforte/fusion-data    -> origin/exclamaforte/fusion-data
2025-12-04T08:57:05.6149552Z  * [new branch]              exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run
2025-12-04T08:57:05.6151366Z  * [new branch]              exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model
2025-12-04T08:57:05.6152737Z  * [new branch]              exclamaforte/gemm-model     -> origin/exclamaforte/gemm-model
2025-12-04T08:57:05.6154601Z  * [new branch]              exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection
2025-12-04T08:57:05.6156092Z  * [new branch]              exclamaforte/gemm-to-amd    -> origin/exclamaforte/gemm-to-amd
2025-12-04T08:57:05.6157795Z  * [new branch]              exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model
2025-12-04T08:57:05.6159606Z  * [new branch]              exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor
2025-12-04T08:57:05.6161372Z  * [new branch]              exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo
2025-12-04T08:57:05.6163031Z  * [new branch]              exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization
2025-12-04T08:57:05.6164716Z  * [new branch]              exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode
2025-12-04T08:57:05.6166424Z  * [new branch]              exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs
2025-12-04T08:57:05.6168069Z  * [new branch]              exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2
2025-12-04T08:57:05.6169619Z  * [new branch]              exec                        -> origin/exec
2025-12-04T08:57:05.6171481Z  * [new branch]              experimental-mosaic         -> origin/experimental-mosaic
2025-12-04T08:57:05.6173210Z  * [new branch]              export-D61047529            -> origin/export-D61047529
2025-12-04T08:57:05.6174870Z  * [new branch]              export-D71412006            -> origin/export-D71412006
2025-12-04T08:57:05.6176646Z  * [new branch]              export-D73042989            -> origin/export-D73042989
2025-12-04T08:57:05.6178228Z  * [new branch]              export-D78957093            -> origin/export-D78957093
2025-12-04T08:57:05.6179840Z  * [new branch]              export-D78996107            -> origin/export-D78996107
2025-12-04T08:57:05.6181547Z  * [new branch]              export-D80823877            -> origin/export-D80823877
2025-12-04T08:57:05.6183262Z  * [new branch]              export-D80958642            -> origin/export-D80958642
2025-12-04T08:57:05.6184908Z  * [new branch]              export-D81054193            -> origin/export-D81054193
2025-12-04T08:57:05.6186495Z  * [new branch]              export-D81204584            -> origin/export-D81204584
2025-12-04T08:57:05.6188143Z  * [new branch]              export-D81429090            -> origin/export-D81429090
2025-12-04T08:57:05.6189900Z  * [new branch]              export-D82250826            -> origin/export-D82250826
2025-12-04T08:57:05.6191517Z  * [new branch]              export-D82253817            -> origin/export-D82253817
2025-12-04T08:57:05.6193162Z  * [new branch]              export-D83541846            -> origin/export-D83541846
2025-12-04T08:57:05.6194872Z  * [new branch]              export-D83627170            -> origin/export-D83627170
2025-12-04T08:57:05.6196550Z  * [new branch]              export-D83766701            -> origin/export-D83766701
2025-12-04T08:57:05.6198135Z  * [new branch]              export-D83768878            -> origin/export-D83768878
2025-12-04T08:57:05.6199835Z  * [new branch]              export-D83769447            -> origin/export-D83769447
2025-12-04T08:57:05.6201567Z  * [new branch]              export-D84089824            -> origin/export-D84089824
2025-12-04T08:57:05.6203155Z  * [new branch]              export-D84213020            -> origin/export-D84213020
2025-12-04T08:57:05.6205172Z  * [new branch]              export-D84373821            -> origin/export-D84373821
2025-12-04T08:57:05.6206818Z  * [new branch]              export-D84612194            -> origin/export-D84612194
2025-12-04T08:57:05.6208650Z  * [new branch]              export-D84890985            -> origin/export-D84890985
2025-12-04T08:57:05.6210227Z  * [new branch]              export-D85122326            -> origin/export-D85122326
2025-12-04T08:57:05.6211943Z  * [new branch]              export-D86256198            -> origin/export-D86256198
2025-12-04T08:57:05.6214034Z  * [new branch]              export-D86460608            -> origin/export-D86460608
2025-12-04T08:57:05.6215808Z  * [new branch]              export-D86474796            -> origin/export-D86474796
2025-12-04T08:57:05.6217769Z  * [new branch]              export-D86712396            -> origin/export-D86712396
2025-12-04T08:57:05.6219453Z  * [new branch]              export-D87022129            -> origin/export-D87022129
2025-12-04T08:57:05.6221150Z  * [new branch]              export-D87838959            -> origin/export-D87838959
2025-12-04T08:57:05.6222815Z  * [new branch]              export-D88319437            -> origin/export-D88319437
2025-12-04T08:57:05.6224662Z  * [new branch]              exported-model-train-idempotent -> origin/exported-model-train-idempotent
2025-12-04T08:57:05.6226268Z  * [new branch]              ezyang-titan-october        -> origin/ezyang-titan-october
2025-12-04T08:57:05.6227860Z  * [new branch]              ezyang-titan-october2       -> origin/ezyang-titan-october2
2025-12-04T08:57:05.6229431Z  * [new branch]              ezyang-war                  -> origin/ezyang-war
2025-12-04T08:57:05.6231594Z  * [new branch]              ezyang/wip-aot-descriptors  -> origin/ezyang/wip-aot-descriptors
2025-12-04T08:57:05.6233244Z  * [new branch]              fa_u8_brgemm                -> origin/fa_u8_brgemm
2025-12-04T08:57:05.6235436Z  * [new branch]              fadeputr/sequence_fbgemm    -> origin/fadeputr/sequence_fbgemm
2025-12-04T08:57:05.6237129Z  * [new branch]              fastmath_baseline           -> origin/fastmath_baseline
2025-12-04T08:57:05.6239310Z  * [new branch]              fbcode/warm                 -> origin/fbcode/warm
2025-12-04T08:57:05.6241231Z  * [new branch]              fca                         -> origin/fca
2025-12-04T08:57:05.6242873Z  * [new branch]              fca2_ca5984c                -> origin/fca2_ca5984c
2025-12-04T08:57:05.6244454Z  * [new branch]              fca5                        -> origin/fca5
2025-12-04T08:57:05.6246628Z  * [new branch]              feature/justknobs-cpp       -> origin/feature/justknobs-cpp
2025-12-04T08:57:05.6248221Z  * [new branch]              feature/numa-forkserver     -> origin/feature/numa-forkserver
2025-12-04T08:57:05.6250212Z  * [new branch]              ffast_math_baseline         -> origin/ffast_math_baseline
2025-12-04T08:57:05.6251791Z  * [new branch]              ffast_math_target           -> origin/ffast_math_target
2025-12-04T08:57:05.6253951Z  * [new branch]              findhao/base_commit         -> origin/findhao/base_commit
2025-12-04T08:57:05.6255558Z  * [new branch]              findhao/base_commit1        -> origin/findhao/base_commit1
2025-12-04T08:57:05.6257477Z  * [new branch]              findhao/multistream2        -> origin/findhao/multistream2
2025-12-04T08:57:05.6259061Z  * [new branch]              findhao/multistream5        -> origin/findhao/multistream5
2025-12-04T08:57:05.6260572Z  * [new branch]              findhao/multistream6        -> origin/findhao/multistream6
2025-12-04T08:57:05.6262127Z  * [new branch]              findhao/operatorbench3      -> origin/findhao/operatorbench3
2025-12-04T08:57:05.6263699Z  * [new branch]              findhao/operatorbench5      -> origin/findhao/operatorbench5
2025-12-04T08:57:05.6265246Z  * [new branch]              findhao/tritonparse         -> origin/findhao/tritonparse
2025-12-04T08:57:05.6266955Z  * [new branch]              fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format
2025-12-04T08:57:05.6268582Z  * [new branch]              fix-config-ignore           -> origin/fix-config-ignore
2025-12-04T08:57:05.6270214Z  * [new branch]              fix-dict-guard              -> origin/fix-dict-guard
2025-12-04T08:57:05.6272132Z  * [new branch]              fix_addmm_issue             -> origin/fix_addmm_issue
2025-12-04T08:57:05.6273654Z  * [new branch]              fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims
2025-12-04T08:57:05.6275239Z  * [new branch]              fix_bench_bwd_pass          -> origin/fix_bench_bwd_pass
2025-12-04T08:57:05.6276809Z  * [new branch]              fix_mem_profiler_config     -> origin/fix_mem_profiler_config
2025-12-04T08:57:05.6278459Z  * [new branch]              fix_nvrtc_discovery         -> origin/fix_nvrtc_discovery
2025-12-04T08:57:05.6280067Z  * [new branch]              fix_op_runner               -> origin/fix_op_runner
2025-12-04T08:57:05.6281832Z  * [new branch]              fix_ubn_159469              -> origin/fix_ubn_159469
2025-12-04T08:57:05.6283541Z  * [new branch]              fixes-triage                -> origin/fixes-triage
2025-12-04T08:57:05.6285169Z  * [new branch]              fixflashinfer               -> origin/fixflashinfer
2025-12-04T08:57:05.6286811Z  * [new branch]              flash_decoding_cpu          -> origin/flash_decoding_cpu
2025-12-04T08:57:05.6288392Z  * [new branch]              flex-flash                  -> origin/flex-flash
2025-12-04T08:57:05.6290130Z  * [new branch]              flex_attention_functorch_grad -> origin/flex_attention_functorch_grad
2025-12-04T08:57:05.6291711Z  * [new branch]              flex_flash                  -> origin/flex_flash
2025-12-04T08:57:05.6294105Z  * [new branch]              fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule
2025-12-04T08:57:05.6295723Z  * [new branch]              fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler
2025-12-04T08:57:05.6297308Z  * [new branch]              forkserver_fix              -> origin/forkserver_fix
2025-12-04T08:57:05.6298984Z  * [new branch]              fsdp2_trace_rules           -> origin/fsdp2_trace_rules
2025-12-04T08:57:05.6300631Z  * [new branch]              fx_cpp                      -> origin/fx_cpp
2025-12-04T08:57:05.6302861Z  * [new branch]              fy/fix-win                  -> origin/fy/fix-win
2025-12-04T08:57:05.6304563Z  * [new branch]              galv-patch-1                -> origin/galv-patch-1
2025-12-04T08:57:05.6306949Z  * [new branch]              galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4
2025-12-04T08:57:05.6309052Z  * [new branch]              georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch
2025-12-04T08:57:05.6312200Z  * [new branch]              gh/AlnisM/1/base            -> origin/gh/AlnisM/1/base
2025-12-04T08:57:05.6313803Z  * [new branch]              gh/AlnisM/1/head            -> origin/gh/AlnisM/1/head
2025-12-04T08:57:05.6316445Z  * [new branch]              gh/EikanWang/67/base        -> origin/gh/EikanWang/67/base
2025-12-04T08:57:05.6318489Z  * [new branch]              gh/EikanWang/67/head        -> origin/gh/EikanWang/67/head
2025-12-04T08:57:05.6321629Z  * [new branch]              gh/Gasoonjia/1/base         -> origin/gh/Gasoonjia/1/base
2025-12-04T08:57:05.6323233Z  * [new branch]              gh/Gasoonjia/1/head         -> origin/gh/Gasoonjia/1/head
2025-12-04T08:57:05.6325869Z  * [new branch]              gh/H-Huang/131/base         -> origin/gh/H-Huang/131/base
2025-12-04T08:57:05.6327531Z  * [new branch]              gh/H-Huang/131/head         -> origin/gh/H-Huang/131/head
2025-12-04T08:57:05.6329203Z  * [new branch]              gh/H-Huang/131/orig         -> origin/gh/H-Huang/131/orig
2025-12-04T08:57:05.6331316Z  * [new branch]              gh/H-Huang/132/base         -> origin/gh/H-Huang/132/base
2025-12-04T08:57:05.6332886Z  * [new branch]              gh/H-Huang/132/head         -> origin/gh/H-Huang/132/head
2025-12-04T08:57:05.6334503Z  * [new branch]              gh/H-Huang/132/orig         -> origin/gh/H-Huang/132/orig
2025-12-04T08:57:05.6336681Z  * [new branch]              gh/H-Huang/180/base         -> origin/gh/H-Huang/180/base
2025-12-04T08:57:05.6338498Z  * [new branch]              gh/H-Huang/180/head         -> origin/gh/H-Huang/180/head
2025-12-04T08:57:05.6339796Z  * [new branch]              gh/H-Huang/180/orig         -> origin/gh/H-Huang/180/orig
2025-12-04T08:57:05.6341974Z  * [new branch]              gh/H-Huang/182/base         -> origin/gh/H-Huang/182/base
2025-12-04T08:57:05.6343514Z  * [new branch]              gh/H-Huang/182/head         -> origin/gh/H-Huang/182/head
2025-12-04T08:57:05.6345092Z  * [new branch]              gh/H-Huang/182/orig         -> origin/gh/H-Huang/182/orig
2025-12-04T08:57:05.6347209Z  * [new branch]              gh/H-Huang/226/base         -> origin/gh/H-Huang/226/base
2025-12-04T08:57:05.6349038Z  * [new branch]              gh/H-Huang/226/head         -> origin/gh/H-Huang/226/head
2025-12-04T08:57:05.6350646Z  * [new branch]              gh/H-Huang/226/orig         -> origin/gh/H-Huang/226/orig
2025-12-04T08:57:05.6352788Z  * [new branch]              gh/H-Huang/228/base         -> origin/gh/H-Huang/228/base
2025-12-04T08:57:05.6354405Z  * [new branch]              gh/H-Huang/228/head         -> origin/gh/H-Huang/228/head
2025-12-04T08:57:05.6356004Z  * [new branch]              gh/H-Huang/228/orig         -> origin/gh/H-Huang/228/orig
2025-12-04T08:57:05.6358635Z  * [new branch]              gh/IvanKobzarev/150/base    -> origin/gh/IvanKobzarev/150/base
2025-12-04T08:57:05.6360231Z  * [new branch]              gh/IvanKobzarev/150/head    -> origin/gh/IvanKobzarev/150/head
2025-12-04T08:57:05.6361838Z  * [new branch]              gh/IvanKobzarev/150/orig    -> origin/gh/IvanKobzarev/150/orig
2025-12-04T08:57:05.6364100Z  * [new branch]              gh/IvanKobzarev/157/base    -> origin/gh/IvanKobzarev/157/base
2025-12-04T08:57:05.6365742Z  * [new branch]              gh/IvanKobzarev/157/head    -> origin/gh/IvanKobzarev/157/head
2025-12-04T08:57:05.6367319Z  * [new branch]              gh/IvanKobzarev/157/orig    -> origin/gh/IvanKobzarev/157/orig
2025-12-04T08:57:05.6369536Z  * [new branch]              gh/IvanKobzarev/159/base    -> origin/gh/IvanKobzarev/159/base
2025-12-04T08:57:05.6371117Z  * [new branch]              gh/IvanKobzarev/159/head    -> origin/gh/IvanKobzarev/159/head
2025-12-04T08:57:05.6372715Z  * [new branch]              gh/IvanKobzarev/159/orig    -> origin/gh/IvanKobzarev/159/orig
2025-12-04T08:57:05.6374980Z  * [new branch]              gh/IvanKobzarev/162/base    -> origin/gh/IvanKobzarev/162/base
2025-12-04T08:57:05.6376732Z  * [new branch]              gh/IvanKobzarev/162/head    -> origin/gh/IvanKobzarev/162/head
2025-12-04T08:57:05.6378383Z  * [new branch]              gh/IvanKobzarev/162/orig    -> origin/gh/IvanKobzarev/162/orig
2025-12-04T08:57:05.6380588Z  * [new branch]              gh/IvanKobzarev/163/base    -> origin/gh/IvanKobzarev/163/base
2025-12-04T08:57:05.6382159Z  * [new branch]              gh/IvanKobzarev/163/head    -> origin/gh/IvanKobzarev/163/head
2025-12-04T08:57:05.6383727Z  * [new branch]              gh/IvanKobzarev/163/orig    -> origin/gh/IvanKobzarev/163/orig
2025-12-04T08:57:05.6385967Z  * [new branch]              gh/IvanKobzarev/166/base    -> origin/gh/IvanKobzarev/166/base
2025-12-04T08:57:05.6387566Z  * [new branch]              gh/IvanKobzarev/166/head    -> origin/gh/IvanKobzarev/166/head
2025-12-04T08:57:05.6389215Z  * [new branch]              gh/IvanKobzarev/166/orig    -> origin/gh/IvanKobzarev/166/orig
2025-12-04T08:57:05.6391386Z  * [new branch]              gh/IvanKobzarev/167/base    -> origin/gh/IvanKobzarev/167/base
2025-12-04T08:57:05.6392900Z  * [new branch]              gh/IvanKobzarev/167/head    -> origin/gh/IvanKobzarev/167/head
2025-12-04T08:57:05.6394456Z  * [new branch]              gh/IvanKobzarev/167/orig    -> origin/gh/IvanKobzarev/167/orig
2025-12-04T08:57:05.6396595Z  * [new branch]              gh/IvanKobzarev/168/base    -> origin/gh/IvanKobzarev/168/base
2025-12-04T08:57:05.6398193Z  * [new branch]              gh/IvanKobzarev/168/head    -> origin/gh/IvanKobzarev/168/head
2025-12-04T08:57:05.6399955Z  * [new branch]              gh/IvanKobzarev/168/orig    -> origin/gh/IvanKobzarev/168/orig
2025-12-04T08:57:05.6402138Z  * [new branch]              gh/IvanKobzarev/169/base    -> origin/gh/IvanKobzarev/169/base
2025-12-04T08:57:05.6403689Z  * [new branch]              gh/IvanKobzarev/169/head    -> origin/gh/IvanKobzarev/169/head
2025-12-04T08:57:05.6405246Z  * [new branch]              gh/IvanKobzarev/169/orig    -> origin/gh/IvanKobzarev/169/orig
2025-12-04T08:57:05.6407272Z  * [new branch]              gh/IvanKobzarev/170/base    -> origin/gh/IvanKobzarev/170/base
2025-12-04T08:57:05.6408849Z  * [new branch]              gh/IvanKobzarev/170/head    -> origin/gh/IvanKobzarev/170/head
2025-12-04T08:57:05.6410491Z  * [new branch]              gh/IvanKobzarev/170/orig    -> origin/gh/IvanKobzarev/170/orig
2025-12-04T08:57:05.6412765Z  * [new branch]              gh/IvanKobzarev/171/base    -> origin/gh/IvanKobzarev/171/base
2025-12-04T08:57:05.6414343Z  * [new branch]              gh/IvanKobzarev/171/head    -> origin/gh/IvanKobzarev/171/head
2025-12-04T08:57:05.6415924Z  * [new branch]              gh/IvanKobzarev/171/orig    -> origin/gh/IvanKobzarev/171/orig
2025-12-04T08:57:05.6418325Z  * [new branch]              gh/IvanKobzarev/172/base    -> origin/gh/IvanKobzarev/172/base
2025-12-04T08:57:05.6419979Z  * [new branch]              gh/IvanKobzarev/172/head    -> origin/gh/IvanKobzarev/172/head
2025-12-04T08:57:05.6421599Z  * [new branch]              gh/IvanKobzarev/172/orig    -> origin/gh/IvanKobzarev/172/orig
2025-12-04T08:57:05.6423853Z  * [new branch]              gh/IvanKobzarev/173/base    -> origin/gh/IvanKobzarev/173/base
2025-12-04T08:57:05.6425419Z  * [new branch]              gh/IvanKobzarev/173/head    -> origin/gh/IvanKobzarev/173/head
2025-12-04T08:57:05.6426996Z  * [new branch]              gh/IvanKobzarev/173/orig    -> origin/gh/IvanKobzarev/173/orig
2025-12-04T08:57:05.6429240Z  * [new branch]              gh/IvanKobzarev/174/base    -> origin/gh/IvanKobzarev/174/base
2025-12-04T08:57:05.6430885Z  * [new branch]              gh/IvanKobzarev/174/head    -> origin/gh/IvanKobzarev/174/head
2025-12-04T08:57:05.6432496Z  * [new branch]              gh/IvanKobzarev/174/orig    -> origin/gh/IvanKobzarev/174/orig
2025-12-04T08:57:05.6434648Z  * [new branch]              gh/IvanKobzarev/175/base    -> origin/gh/IvanKobzarev/175/base
2025-12-04T08:57:05.6436405Z  * [new branch]              gh/IvanKobzarev/175/head    -> origin/gh/IvanKobzarev/175/head
2025-12-04T08:57:05.6438008Z  * [new branch]              gh/IvanKobzarev/175/orig    -> origin/gh/IvanKobzarev/175/orig
2025-12-04T08:57:05.6440483Z  * [new branch]              gh/IvanKobzarev/176/base    -> origin/gh/IvanKobzarev/176/base
2025-12-04T08:57:05.6442116Z  * [new branch]              gh/IvanKobzarev/176/head    -> origin/gh/IvanKobzarev/176/head
2025-12-04T08:57:05.6443680Z  * [new branch]              gh/IvanKobzarev/176/orig    -> origin/gh/IvanKobzarev/176/orig
2025-12-04T08:57:05.6446146Z  * [new branch]              gh/IvanKobzarev/177/base    -> origin/gh/IvanKobzarev/177/base
2025-12-04T08:57:05.6447765Z  * [new branch]              gh/IvanKobzarev/177/head    -> origin/gh/IvanKobzarev/177/head
2025-12-04T08:57:05.6449412Z  * [new branch]              gh/IvanKobzarev/177/orig    -> origin/gh/IvanKobzarev/177/orig
2025-12-04T08:57:05.6451714Z  * [new branch]              gh/IvanKobzarev/178/base    -> origin/gh/IvanKobzarev/178/base
2025-12-04T08:57:05.6453350Z  * [new branch]              gh/IvanKobzarev/178/head    -> origin/gh/IvanKobzarev/178/head
2025-12-04T08:57:05.6454927Z  * [new branch]              gh/IvanKobzarev/178/orig    -> origin/gh/IvanKobzarev/178/orig
2025-12-04T08:57:05.6457198Z  * [new branch]              gh/IvanKobzarev/179/base    -> origin/gh/IvanKobzarev/179/base
2025-12-04T08:57:05.6458761Z  * [new branch]              gh/IvanKobzarev/179/head    -> origin/gh/IvanKobzarev/179/head
2025-12-04T08:57:05.6460364Z  * [new branch]              gh/IvanKobzarev/179/orig    -> origin/gh/IvanKobzarev/179/orig
2025-12-04T08:57:05.6462878Z  * [new branch]              gh/IvanKobzarev/180/base    -> origin/gh/IvanKobzarev/180/base
2025-12-04T08:57:05.6464282Z  * [new branch]              gh/IvanKobzarev/180/head    -> origin/gh/IvanKobzarev/180/head
2025-12-04T08:57:05.6465968Z  * [new branch]              gh/IvanKobzarev/180/orig    -> origin/gh/IvanKobzarev/180/orig
2025-12-04T08:57:05.6468295Z  * [new branch]              gh/IvanKobzarev/181/base    -> origin/gh/IvanKobzarev/181/base
2025-12-04T08:57:05.6469907Z  * [new branch]              gh/IvanKobzarev/181/head    -> origin/gh/IvanKobzarev/181/head
2025-12-04T08:57:05.6471588Z  * [new branch]              gh/IvanKobzarev/181/orig    -> origin/gh/IvanKobzarev/181/orig
2025-12-04T08:57:05.6473938Z  * [new branch]              gh/IvanKobzarev/182/base    -> origin/gh/IvanKobzarev/182/base
2025-12-04T08:57:05.6475496Z  * [new branch]              gh/IvanKobzarev/182/head    -> origin/gh/IvanKobzarev/182/head
2025-12-04T08:57:05.6477075Z  * [new branch]              gh/IvanKobzarev/182/orig    -> origin/gh/IvanKobzarev/182/orig
2025-12-04T08:57:05.6479460Z  * [new branch]              gh/IvanKobzarev/183/base    -> origin/gh/IvanKobzarev/183/base
2025-12-04T08:57:05.6481235Z  * [new branch]              gh/IvanKobzarev/183/head    -> origin/gh/IvanKobzarev/183/head
2025-12-04T08:57:05.6482847Z  * [new branch]              gh/IvanKobzarev/183/orig    -> origin/gh/IvanKobzarev/183/orig
2025-12-04T08:57:05.6485258Z  * [new branch]              gh/IvanKobzarev/184/base    -> origin/gh/IvanKobzarev/184/base
2025-12-04T08:57:05.6486853Z  * [new branch]              gh/IvanKobzarev/184/head    -> origin/gh/IvanKobzarev/184/head
2025-12-04T08:57:05.6488502Z  * [new branch]              gh/IvanKobzarev/184/orig    -> origin/gh/IvanKobzarev/184/orig
2025-12-04T08:57:05.6491134Z  * [new branch]              gh/NikhilAPatel/1/base      -> origin/gh/NikhilAPatel/1/base
2025-12-04T08:57:05.6492833Z  * [new branch]              gh/NikhilAPatel/1/head      -> origin/gh/NikhilAPatel/1/head
2025-12-04T08:57:05.6494916Z  * [new branch]              gh/NikhilAPatel/2/base      -> origin/gh/NikhilAPatel/2/base
2025-12-04T08:57:05.6496460Z  * [new branch]              gh/NikhilAPatel/2/head      -> origin/gh/NikhilAPatel/2/head
2025-12-04T08:57:05.6498812Z  * [new branch]              gh/NikhilAPatel/4/base      -> origin/gh/NikhilAPatel/4/base
2025-12-04T08:57:05.6500445Z  * [new branch]              gh/NikhilAPatel/4/head      -> origin/gh/NikhilAPatel/4/head
2025-12-04T08:57:05.6502553Z  * [new branch]              gh/NikhilAPatel/5/base      -> origin/gh/NikhilAPatel/5/base
2025-12-04T08:57:05.6504188Z  * [new branch]              gh/NikhilAPatel/5/head      -> origin/gh/NikhilAPatel/5/head
2025-12-04T08:57:05.6505788Z  * [new branch]              gh/NikhilAPatel/5/orig      -> origin/gh/NikhilAPatel/5/orig
2025-12-04T08:57:05.6508363Z  * [new branch]              gh/PaliC/17/base            -> origin/gh/PaliC/17/base
2025-12-04T08:57:05.6509997Z  * [new branch]              gh/PaliC/17/head            -> origin/gh/PaliC/17/head
2025-12-04T08:57:05.6511616Z  * [new branch]              gh/PaliC/17/orig            -> origin/gh/PaliC/17/orig
2025-12-04T08:57:05.6513826Z  * [new branch]              gh/PaliC/18/base            -> origin/gh/PaliC/18/base
2025-12-04T08:57:05.6528977Z  * [new branch]              gh/PaliC/18/head            -> origin/gh/PaliC/18/head
2025-12-04T08:57:05.6529399Z  * [new branch]              gh/PaliC/18/orig            -> origin/gh/PaliC/18/orig
2025-12-04T08:57:05.6529908Z  * [new branch]              gh/PaliC/20/base            -> origin/gh/PaliC/20/base
2025-12-04T08:57:05.6530350Z  * [new branch]              gh/PaliC/20/head            -> origin/gh/PaliC/20/head
2025-12-04T08:57:05.6530706Z  * [new branch]              gh/PaliC/20/orig            -> origin/gh/PaliC/20/orig
2025-12-04T08:57:05.6531054Z  * [new branch]              gh/PaliC/21/base            -> origin/gh/PaliC/21/base
2025-12-04T08:57:05.6531587Z  * [new branch]              gh/PaliC/21/head            -> origin/gh/PaliC/21/head
2025-12-04T08:57:05.6532130Z  * [new branch]              gh/PaliC/21/orig            -> origin/gh/PaliC/21/orig
2025-12-04T08:57:05.6532614Z  * [new branch]              gh/PaliC/23/base            -> origin/gh/PaliC/23/base
2025-12-04T08:57:05.6533101Z  * [new branch]              gh/PaliC/23/head            -> origin/gh/PaliC/23/head
2025-12-04T08:57:05.6533552Z  * [new branch]              gh/PaliC/23/orig            -> origin/gh/PaliC/23/orig
2025-12-04T08:57:05.6535455Z  * [new branch]              gh/PaliC/24/base            -> origin/gh/PaliC/24/base
2025-12-04T08:57:05.6536992Z  * [new branch]              gh/PaliC/24/head            -> origin/gh/PaliC/24/head
2025-12-04T08:57:05.6538563Z  * [new branch]              gh/PaliC/24/orig            -> origin/gh/PaliC/24/orig
2025-12-04T08:57:05.6540674Z  * [new branch]              gh/PaliC/25/head            -> origin/gh/PaliC/25/head
2025-12-04T08:57:05.6542218Z  * [new branch]              gh/PaliC/25/next            -> origin/gh/PaliC/25/next
2025-12-04T08:57:05.6543856Z  * [new branch]              gh/PaliC/25/orig            -> origin/gh/PaliC/25/orig
2025-12-04T08:57:05.6546047Z  * [new branch]              gh/PaliC/26/head            -> origin/gh/PaliC/26/head
2025-12-04T08:57:05.6547496Z  * [new branch]              gh/PaliC/26/next            -> origin/gh/PaliC/26/next
2025-12-04T08:57:05.6549063Z  * [new branch]              gh/PaliC/26/orig            -> origin/gh/PaliC/26/orig
2025-12-04T08:57:05.6551263Z  * [new branch]              gh/PaliC/27/next            -> origin/gh/PaliC/27/next
2025-12-04T08:57:05.6553468Z  * [new branch]              gh/PaliC/28/head            -> origin/gh/PaliC/28/head
2025-12-04T08:57:05.6554945Z  * [new branch]              gh/PaliC/28/next            -> origin/gh/PaliC/28/next
2025-12-04T08:57:05.6556547Z  * [new branch]              gh/PaliC/28/orig            -> origin/gh/PaliC/28/orig
2025-12-04T08:57:05.6558724Z  * [new branch]              gh/PaliC/29/head            -> origin/gh/PaliC/29/head
2025-12-04T08:57:05.6560310Z  * [new branch]              gh/PaliC/29/next            -> origin/gh/PaliC/29/next
2025-12-04T08:57:05.6561859Z  * [new branch]              gh/PaliC/29/orig            -> origin/gh/PaliC/29/orig
2025-12-04T08:57:05.6564001Z  * [new branch]              gh/PaliC/30/head            -> origin/gh/PaliC/30/head
2025-12-04T08:57:05.6565441Z  * [new branch]              gh/PaliC/30/next            -> origin/gh/PaliC/30/next
2025-12-04T08:57:05.6567070Z  * [new branch]              gh/PaliC/30/orig            -> origin/gh/PaliC/30/orig
2025-12-04T08:57:05.6569199Z  * [new branch]              gh/PaliC/31/head            -> origin/gh/PaliC/31/head
2025-12-04T08:57:05.6570764Z  * [new branch]              gh/PaliC/31/next            -> origin/gh/PaliC/31/next
2025-12-04T08:57:05.6572368Z  * [new branch]              gh/PaliC/31/orig            -> origin/gh/PaliC/31/orig
2025-12-04T08:57:05.6574881Z  * [new branch]              gh/PaulZhang12/25/base      -> origin/gh/PaulZhang12/25/base
2025-12-04T08:57:05.6576581Z  * [new branch]              gh/PaulZhang12/25/head      -> origin/gh/PaulZhang12/25/head
2025-12-04T08:57:05.6578244Z  * [new branch]              gh/PaulZhang12/25/orig      -> origin/gh/PaulZhang12/25/orig
2025-12-04T08:57:05.6580416Z  * [new branch]              gh/PaulZhang12/28/base      -> origin/gh/PaulZhang12/28/base
2025-12-04T08:57:05.6582063Z  * [new branch]              gh/PaulZhang12/28/head      -> origin/gh/PaulZhang12/28/head
2025-12-04T08:57:05.6583674Z  * [new branch]              gh/PaulZhang12/28/orig      -> origin/gh/PaulZhang12/28/orig
2025-12-04T08:57:05.6586014Z  * [new branch]              gh/PaulZhang12/31/base      -> origin/gh/PaulZhang12/31/base
2025-12-04T08:57:05.6587543Z  * [new branch]              gh/PaulZhang12/31/head      -> origin/gh/PaulZhang12/31/head
2025-12-04T08:57:05.6589277Z  * [new branch]              gh/PaulZhang12/31/orig      -> origin/gh/PaulZhang12/31/orig
2025-12-04T08:57:05.6591750Z  * [new branch]              gh/PaulZhang12/37/base      -> origin/gh/PaulZhang12/37/base
2025-12-04T08:57:05.6593873Z  * [new branch]              gh/PaulZhang12/37/head      -> origin/gh/PaulZhang12/37/head
2025-12-04T08:57:05.6595187Z  * [new branch]              gh/PaulZhang12/37/orig      -> origin/gh/PaulZhang12/37/orig
2025-12-04T08:57:05.6597420Z  * [new branch]              gh/PaulZhang12/40/base      -> origin/gh/PaulZhang12/40/base
2025-12-04T08:57:05.6599045Z  * [new branch]              gh/PaulZhang12/40/head      -> origin/gh/PaulZhang12/40/head
2025-12-04T08:57:05.6600738Z  * [new branch]              gh/PaulZhang12/40/orig      -> origin/gh/PaulZhang12/40/orig
2025-12-04T08:57:05.6603009Z  * [new branch]              gh/PaulZhang12/42/base      -> origin/gh/PaulZhang12/42/base
2025-12-04T08:57:05.6604571Z  * [new branch]              gh/PaulZhang12/42/head      -> origin/gh/PaulZhang12/42/head
2025-12-04T08:57:05.6606697Z  * [new branch]              gh/PaulZhang12/43/base      -> origin/gh/PaulZhang12/43/base
2025-12-04T08:57:05.6608346Z  * [new branch]              gh/PaulZhang12/43/head      -> origin/gh/PaulZhang12/43/head
2025-12-04T08:57:05.6609914Z  * [new branch]              gh/PaulZhang12/43/orig      -> origin/gh/PaulZhang12/43/orig
2025-12-04T08:57:05.6611934Z  * [new branch]              gh/PaulZhang12/44/base      -> origin/gh/PaulZhang12/44/base
2025-12-04T08:57:05.6613581Z  * [new branch]              gh/PaulZhang12/44/head      -> origin/gh/PaulZhang12/44/head
2025-12-04T08:57:05.6615772Z  * [new branch]              gh/PaulZhang12/45/base      -> origin/gh/PaulZhang12/45/base
2025-12-04T08:57:05.6617452Z  * [new branch]              gh/PaulZhang12/45/head      -> origin/gh/PaulZhang12/45/head
2025-12-04T08:57:05.6619796Z  * [new branch]              gh/PaulZhang12/45/orig      -> origin/gh/PaulZhang12/45/orig
2025-12-04T08:57:05.6622006Z  * [new branch]              gh/PaulZhang12/46/base      -> origin/gh/PaulZhang12/46/base
2025-12-04T08:57:05.6623638Z  * [new branch]              gh/PaulZhang12/46/head      -> origin/gh/PaulZhang12/46/head
2025-12-04T08:57:05.6625252Z  * [new branch]              gh/PaulZhang12/46/orig      -> origin/gh/PaulZhang12/46/orig
2025-12-04T08:57:05.6627438Z  * [new branch]              gh/PaulZhang12/47/base      -> origin/gh/PaulZhang12/47/base
2025-12-04T08:57:05.6629127Z  * [new branch]              gh/PaulZhang12/47/head      -> origin/gh/PaulZhang12/47/head
2025-12-04T08:57:05.6630938Z  * [new branch]              gh/PaulZhang12/47/orig      -> origin/gh/PaulZhang12/47/orig
2025-12-04T08:57:05.6632973Z  * [new branch]              gh/PaulZhang12/48/base      -> origin/gh/PaulZhang12/48/base
2025-12-04T08:57:05.6634562Z  * [new branch]              gh/PaulZhang12/48/head      -> origin/gh/PaulZhang12/48/head
2025-12-04T08:57:05.6636156Z  * [new branch]              gh/PaulZhang12/48/orig      -> origin/gh/PaulZhang12/48/orig
2025-12-04T08:57:05.6638707Z  * [new branch]              gh/SamGinzburg/11/base      -> origin/gh/SamGinzburg/11/base
2025-12-04T08:57:05.6640404Z  * [new branch]              gh/SamGinzburg/11/head      -> origin/gh/SamGinzburg/11/head
2025-12-04T08:57:05.6643184Z  * [new branch]              gh/SherlockNoMad/1/base     -> origin/gh/SherlockNoMad/1/base
2025-12-04T08:57:05.6644762Z  * [new branch]              gh/SherlockNoMad/1/head     -> origin/gh/SherlockNoMad/1/head
2025-12-04T08:57:05.6647029Z  * [new branch]              gh/SherlockNoMad/10/base    -> origin/gh/SherlockNoMad/10/base
2025-12-04T08:57:05.6648666Z  * [new branch]              gh/SherlockNoMad/10/head    -> origin/gh/SherlockNoMad/10/head
2025-12-04T08:57:05.6650345Z  * [new branch]              gh/SherlockNoMad/10/orig    -> origin/gh/SherlockNoMad/10/orig
2025-12-04T08:57:05.6652375Z  * [new branch]              gh/SherlockNoMad/11/base    -> origin/gh/SherlockNoMad/11/base
2025-12-04T08:57:05.6654015Z  * [new branch]              gh/SherlockNoMad/11/head    -> origin/gh/SherlockNoMad/11/head
2025-12-04T08:57:05.6655708Z  * [new branch]              gh/SherlockNoMad/11/orig    -> origin/gh/SherlockNoMad/11/orig
2025-12-04T08:57:05.6657866Z  * [new branch]              gh/SherlockNoMad/12/base    -> origin/gh/SherlockNoMad/12/base
2025-12-04T08:57:05.6659268Z  * [new branch]              gh/SherlockNoMad/12/head    -> origin/gh/SherlockNoMad/12/head
2025-12-04T08:57:05.6660947Z  * [new branch]              gh/SherlockNoMad/12/orig    -> origin/gh/SherlockNoMad/12/orig
2025-12-04T08:57:05.6663124Z  * [new branch]              gh/SherlockNoMad/15/base    -> origin/gh/SherlockNoMad/15/base
2025-12-04T08:57:05.6664730Z  * [new branch]              gh/SherlockNoMad/15/head    -> origin/gh/SherlockNoMad/15/head
2025-12-04T08:57:05.6666357Z  * [new branch]              gh/SherlockNoMad/15/orig    -> origin/gh/SherlockNoMad/15/orig
2025-12-04T08:57:05.6668540Z  * [new branch]              gh/SherlockNoMad/17/base    -> origin/gh/SherlockNoMad/17/base
2025-12-04T08:57:05.6670105Z  * [new branch]              gh/SherlockNoMad/17/head    -> origin/gh/SherlockNoMad/17/head
2025-12-04T08:57:05.6671682Z  * [new branch]              gh/SherlockNoMad/17/orig    -> origin/gh/SherlockNoMad/17/orig
2025-12-04T08:57:05.6673929Z  * [new branch]              gh/SherlockNoMad/18/base    -> origin/gh/SherlockNoMad/18/base
2025-12-04T08:57:05.6675727Z  * [new branch]              gh/SherlockNoMad/18/head    -> origin/gh/SherlockNoMad/18/head
2025-12-04T08:57:05.6677380Z  * [new branch]              gh/SherlockNoMad/18/orig    -> origin/gh/SherlockNoMad/18/orig
2025-12-04T08:57:05.6679336Z  * [new branch]              gh/SherlockNoMad/19/base    -> origin/gh/SherlockNoMad/19/base
2025-12-04T08:57:05.6681085Z  * [new branch]              gh/SherlockNoMad/19/head    -> origin/gh/SherlockNoMad/19/head
2025-12-04T08:57:05.6682752Z  * [new branch]              gh/SherlockNoMad/19/orig    -> origin/gh/SherlockNoMad/19/orig
2025-12-04T08:57:05.6684745Z  * [new branch]              gh/SherlockNoMad/2/base     -> origin/gh/SherlockNoMad/2/base
2025-12-04T08:57:05.6686254Z  * [new branch]              gh/SherlockNoMad/2/head     -> origin/gh/SherlockNoMad/2/head
2025-12-04T08:57:05.6688307Z  * [new branch]              gh/SherlockNoMad/20/base    -> origin/gh/SherlockNoMad/20/base
2025-12-04T08:57:05.6690023Z  * [new branch]              gh/SherlockNoMad/20/head    -> origin/gh/SherlockNoMad/20/head
2025-12-04T08:57:05.6691547Z  * [new branch]              gh/SherlockNoMad/20/orig    -> origin/gh/SherlockNoMad/20/orig
2025-12-04T08:57:05.6693868Z  * [new branch]              gh/SherlockNoMad/21/base    -> origin/gh/SherlockNoMad/21/base
2025-12-04T08:57:05.6695553Z  * [new branch]              gh/SherlockNoMad/21/head    -> origin/gh/SherlockNoMad/21/head
2025-12-04T08:57:05.6697096Z  * [new branch]              gh/SherlockNoMad/21/orig    -> origin/gh/SherlockNoMad/21/orig
2025-12-04T08:57:05.6699078Z  * [new branch]              gh/SherlockNoMad/3/base     -> origin/gh/SherlockNoMad/3/base
2025-12-04T08:57:05.6700659Z  * [new branch]              gh/SherlockNoMad/3/head     -> origin/gh/SherlockNoMad/3/head
2025-12-04T08:57:05.6702695Z  * [new branch]              gh/SherlockNoMad/4/base     -> origin/gh/SherlockNoMad/4/base
2025-12-04T08:57:05.6704223Z  * [new branch]              gh/SherlockNoMad/4/head     -> origin/gh/SherlockNoMad/4/head
2025-12-04T08:57:05.6706188Z  * [new branch]              gh/SherlockNoMad/5/base     -> origin/gh/SherlockNoMad/5/base
2025-12-04T08:57:05.6707751Z  * [new branch]              gh/SherlockNoMad/5/head     -> origin/gh/SherlockNoMad/5/head
2025-12-04T08:57:05.6710845Z  * [new branch]              gh/Sidharth123-cpu/24/base  -> origin/gh/Sidharth123-cpu/24/base
2025-12-04T08:57:05.6712942Z  * [new branch]              gh/Sidharth123-cpu/25/base  -> origin/gh/Sidharth123-cpu/25/base
2025-12-04T08:57:05.6714936Z  * [new branch]              gh/Sidharth123-cpu/26/base  -> origin/gh/Sidharth123-cpu/26/base
2025-12-04T08:57:05.6717252Z  * [new branch]              gh/Sidharth123-cpu/27/base  -> origin/gh/Sidharth123-cpu/27/base
2025-12-04T08:57:05.6720095Z  * [new branch]              gh/StrongerXi/1/base        -> origin/gh/StrongerXi/1/base
2025-12-04T08:57:05.6721879Z  * [new branch]              gh/StrongerXi/1/head        -> origin/gh/StrongerXi/1/head
2025-12-04T08:57:05.6723894Z  * [new branch]              gh/StrongerXi/71/base       -> origin/gh/StrongerXi/71/base
2025-12-04T08:57:05.6725504Z  * [new branch]              gh/StrongerXi/71/head       -> origin/gh/StrongerXi/71/head
2025-12-04T08:57:05.6727528Z  * [new branch]              gh/StrongerXi/72/base       -> origin/gh/StrongerXi/72/base
2025-12-04T08:57:05.6729156Z  * [new branch]              gh/StrongerXi/72/head       -> origin/gh/StrongerXi/72/head
2025-12-04T08:57:05.6731244Z  * [new branch]              gh/StrongerXi/73/base       -> origin/gh/StrongerXi/73/base
2025-12-04T08:57:05.6732749Z  * [new branch]              gh/StrongerXi/73/head       -> origin/gh/StrongerXi/73/head
2025-12-04T08:57:05.6734483Z  * [new branch]              gh/StrongerXi/73/orig       -> origin/gh/StrongerXi/73/orig
2025-12-04T08:57:05.6737194Z  * [new branch]              gh/XilunWu/160/base         -> origin/gh/XilunWu/160/base
2025-12-04T08:57:05.6738612Z  * [new branch]              gh/XilunWu/160/head         -> origin/gh/XilunWu/160/head
2025-12-04T08:57:05.6740270Z  * [new branch]              gh/XilunWu/160/orig         -> origin/gh/XilunWu/160/orig
2025-12-04T08:57:05.6742382Z  * [new branch]              gh/XilunWu/163/base         -> origin/gh/XilunWu/163/base
2025-12-04T08:57:05.6744021Z  * [new branch]              gh/XilunWu/163/head         -> origin/gh/XilunWu/163/head
2025-12-04T08:57:05.6745653Z  * [new branch]              gh/XilunWu/163/orig         -> origin/gh/XilunWu/163/orig
2025-12-04T08:57:05.6747870Z  * [new branch]              gh/XilunWu/168/base         -> origin/gh/XilunWu/168/base
2025-12-04T08:57:05.6749421Z  * [new branch]              gh/XilunWu/168/head         -> origin/gh/XilunWu/168/head
2025-12-04T08:57:05.6750949Z  * [new branch]              gh/XilunWu/168/orig         -> origin/gh/XilunWu/168/orig
2025-12-04T08:57:05.6753165Z  * [new branch]              gh/XilunWu/169/base         -> origin/gh/XilunWu/169/base
2025-12-04T08:57:05.6754714Z  * [new branch]              gh/XilunWu/169/head         -> origin/gh/XilunWu/169/head
2025-12-04T08:57:05.6756323Z  * [new branch]              gh/XilunWu/169/orig         -> origin/gh/XilunWu/169/orig
2025-12-04T08:57:05.6758316Z  * [new branch]              gh/XilunWu/170/base         -> origin/gh/XilunWu/170/base
2025-12-04T08:57:05.6759961Z  * [new branch]              gh/XilunWu/170/head         -> origin/gh/XilunWu/170/head
2025-12-04T08:57:05.6761607Z  * [new branch]              gh/XilunWu/170/orig         -> origin/gh/XilunWu/170/orig
2025-12-04T08:57:05.6763810Z  * [new branch]              gh/XilunWu/171/base         -> origin/gh/XilunWu/171/base
2025-12-04T08:57:05.6765423Z  * [new branch]              gh/XilunWu/171/head         -> origin/gh/XilunWu/171/head
2025-12-04T08:57:05.6767014Z  * [new branch]              gh/XilunWu/171/orig         -> origin/gh/XilunWu/171/orig
2025-12-04T08:57:05.6769067Z  * [new branch]              gh/XilunWu/173/base         -> origin/gh/XilunWu/173/base
2025-12-04T08:57:05.6770682Z  * [new branch]              gh/XilunWu/173/head         -> origin/gh/XilunWu/173/head
2025-12-04T08:57:05.6772280Z  * [new branch]              gh/XilunWu/173/orig         -> origin/gh/XilunWu/173/orig
2025-12-04T08:57:05.6774434Z  * [new branch]              gh/XilunWu/175/base         -> origin/gh/XilunWu/175/base
2025-12-04T08:57:05.6775959Z  * [new branch]              gh/XilunWu/175/head         -> origin/gh/XilunWu/175/head
2025-12-04T08:57:05.6777565Z  * [new branch]              gh/XilunWu/175/orig         -> origin/gh/XilunWu/175/orig
2025-12-04T08:57:05.6779731Z  * [new branch]              gh/XilunWu/176/base         -> origin/gh/XilunWu/176/base
2025-12-04T08:57:05.6781448Z  * [new branch]              gh/XilunWu/176/head         -> origin/gh/XilunWu/176/head
2025-12-04T08:57:05.6783140Z  * [new branch]              gh/XilunWu/176/orig         -> origin/gh/XilunWu/176/orig
2025-12-04T08:57:05.6785854Z  * [new branch]              gh/XuehaiPan/14/base        -> origin/gh/XuehaiPan/14/base
2025-12-04T08:57:05.6787405Z  * [new branch]              gh/XuehaiPan/14/head        -> origin/gh/XuehaiPan/14/head
2025-12-04T08:57:05.6788953Z  * [new branch]              gh/XuehaiPan/14/orig        -> origin/gh/XuehaiPan/14/orig
2025-12-04T08:57:05.6791092Z  * [new branch]              gh/XuehaiPan/179/base       -> origin/gh/XuehaiPan/179/base
2025-12-04T08:57:05.6792714Z  * [new branch]              gh/XuehaiPan/179/head       -> origin/gh/XuehaiPan/179/head
2025-12-04T08:57:05.6794385Z  * [new branch]              gh/XuehaiPan/179/orig       -> origin/gh/XuehaiPan/179/orig
2025-12-04T08:57:05.6796429Z  * [new branch]              gh/XuehaiPan/249/base       -> origin/gh/XuehaiPan/249/base
2025-12-04T08:57:05.6798000Z  * [new branch]              gh/XuehaiPan/249/head       -> origin/gh/XuehaiPan/249/head
2025-12-04T08:57:05.6799704Z  * [new branch]              gh/XuehaiPan/249/orig       -> origin/gh/XuehaiPan/249/orig
2025-12-04T08:57:05.6801939Z  * [new branch]              gh/XuehaiPan/253/base       -> origin/gh/XuehaiPan/253/base
2025-12-04T08:57:05.6803511Z  * [new branch]              gh/XuehaiPan/253/head       -> origin/gh/XuehaiPan/253/head
2025-12-04T08:57:05.6805081Z  * [new branch]              gh/XuehaiPan/253/orig       -> origin/gh/XuehaiPan/253/orig
2025-12-04T08:57:05.6807254Z  * [new branch]              gh/XuehaiPan/254/base       -> origin/gh/XuehaiPan/254/base
2025-12-04T08:57:05.6808884Z  * [new branch]              gh/XuehaiPan/254/head       -> origin/gh/XuehaiPan/254/head
2025-12-04T08:57:05.6810505Z  * [new branch]              gh/XuehaiPan/254/orig       -> origin/gh/XuehaiPan/254/orig
2025-12-04T08:57:05.6812607Z  * [new branch]              gh/XuehaiPan/255/base       -> origin/gh/XuehaiPan/255/base
2025-12-04T08:57:05.6814181Z  * [new branch]              gh/XuehaiPan/255/head       -> origin/gh/XuehaiPan/255/head
2025-12-04T08:57:05.6815790Z  * [new branch]              gh/XuehaiPan/255/orig       -> origin/gh/XuehaiPan/255/orig
2025-12-04T08:57:05.6818113Z  * [new branch]              gh/XuehaiPan/271/base       -> origin/gh/XuehaiPan/271/base
2025-12-04T08:57:05.6819696Z  * [new branch]              gh/XuehaiPan/271/head       -> origin/gh/XuehaiPan/271/head
2025-12-04T08:57:05.6821329Z  * [new branch]              gh/XuehaiPan/271/orig       -> origin/gh/XuehaiPan/271/orig
2025-12-04T08:57:05.6823421Z  * [new branch]              gh/XuehaiPan/343/base       -> origin/gh/XuehaiPan/343/base
2025-12-04T08:57:05.6824990Z  * [new branch]              gh/XuehaiPan/343/head       -> origin/gh/XuehaiPan/343/head
2025-12-04T08:57:05.6826570Z  * [new branch]              gh/XuehaiPan/343/orig       -> origin/gh/XuehaiPan/343/orig
2025-12-04T08:57:05.6828742Z  * [new branch]              gh/XuehaiPan/347/base       -> origin/gh/XuehaiPan/347/base
2025-12-04T08:57:05.6830318Z  * [new branch]              gh/XuehaiPan/347/head       -> origin/gh/XuehaiPan/347/head
2025-12-04T08:57:05.6831959Z  * [new branch]              gh/XuehaiPan/347/orig       -> origin/gh/XuehaiPan/347/orig
2025-12-04T08:57:05.6834088Z  * [new branch]              gh/XuehaiPan/348/base       -> origin/gh/XuehaiPan/348/base
2025-12-04T08:57:05.6835632Z  * [new branch]              gh/XuehaiPan/348/head       -> origin/gh/XuehaiPan/348/head
2025-12-04T08:57:05.6837263Z  * [new branch]              gh/XuehaiPan/348/orig       -> origin/gh/XuehaiPan/348/orig
2025-12-04T08:57:05.6839337Z  * [new branch]              gh/XuehaiPan/350/base       -> origin/gh/XuehaiPan/350/base
2025-12-04T08:57:05.6841055Z  * [new branch]              gh/XuehaiPan/350/head       -> origin/gh/XuehaiPan/350/head
2025-12-04T08:57:05.6842635Z  * [new branch]              gh/XuehaiPan/350/orig       -> origin/gh/XuehaiPan/350/orig
2025-12-04T08:57:05.6844773Z  * [new branch]              gh/XuehaiPan/365/base       -> origin/gh/XuehaiPan/365/base
2025-12-04T08:57:05.6846455Z  * [new branch]              gh/XuehaiPan/365/head       -> origin/gh/XuehaiPan/365/head
2025-12-04T08:57:05.6848193Z  * [new branch]              gh/XuehaiPan/365/orig       -> origin/gh/XuehaiPan/365/orig
2025-12-04T08:57:05.6850308Z  * [new branch]              gh/XuehaiPan/366/base       -> origin/gh/XuehaiPan/366/base
2025-12-04T08:57:05.6851872Z  * [new branch]              gh/XuehaiPan/366/head       -> origin/gh/XuehaiPan/366/head
2025-12-04T08:57:05.6853969Z  * [new branch]              gh/XuehaiPan/370/base       -> origin/gh/XuehaiPan/370/base
2025-12-04T08:57:05.6855549Z  * [new branch]              gh/XuehaiPan/370/head       -> origin/gh/XuehaiPan/370/head
2025-12-04T08:57:05.6857164Z  * [new branch]              gh/XuehaiPan/370/orig       -> origin/gh/XuehaiPan/370/orig
2025-12-04T08:57:05.6859426Z  * [new branch]              gh/XuehaiPan/390/base       -> origin/gh/XuehaiPan/390/base
2025-12-04T08:57:05.6860974Z  * [new branch]              gh/XuehaiPan/390/head       -> origin/gh/XuehaiPan/390/head
2025-12-04T08:57:05.6862750Z  * [new branch]              gh/XuehaiPan/390/orig       -> origin/gh/XuehaiPan/390/orig
2025-12-04T08:57:05.6864874Z  * [new branch]              gh/XuehaiPan/391/base       -> origin/gh/XuehaiPan/391/base
2025-12-04T08:57:05.6866436Z  * [new branch]              gh/XuehaiPan/391/head       -> origin/gh/XuehaiPan/391/head
2025-12-04T08:57:05.6867984Z  * [new branch]              gh/XuehaiPan/391/orig       -> origin/gh/XuehaiPan/391/orig
2025-12-04T08:57:05.6870114Z  * [new branch]              gh/XuehaiPan/392/base       -> origin/gh/XuehaiPan/392/base
2025-12-04T08:57:05.6871754Z  * [new branch]              gh/XuehaiPan/392/head       -> origin/gh/XuehaiPan/392/head
2025-12-04T08:57:05.6873319Z  * [new branch]              gh/XuehaiPan/392/orig       -> origin/gh/XuehaiPan/392/orig
2025-12-04T08:57:05.6875918Z  * [new branch]              gh/XuehaiPan/394/base       -> origin/gh/XuehaiPan/394/base
2025-12-04T08:57:05.6877493Z  * [new branch]              gh/XuehaiPan/394/head       -> origin/gh/XuehaiPan/394/head
2025-12-04T08:57:05.6879575Z  * [new branch]              gh/XuehaiPan/394/orig       -> origin/gh/XuehaiPan/394/orig
2025-12-04T08:57:05.6881975Z  * [new branch]              gh/XuehaiPan/397/base       -> origin/gh/XuehaiPan/397/base
2025-12-04T08:57:05.6883559Z  * [new branch]              gh/XuehaiPan/397/head       -> origin/gh/XuehaiPan/397/head
2025-12-04T08:57:05.6885198Z  * [new branch]              gh/XuehaiPan/397/orig       -> origin/gh/XuehaiPan/397/orig
2025-12-04T08:57:05.6887382Z  * [new branch]              gh/XuehaiPan/398/base       -> origin/gh/XuehaiPan/398/base
2025-12-04T08:57:05.6888960Z  * [new branch]              gh/XuehaiPan/398/head       -> origin/gh/XuehaiPan/398/head
2025-12-04T08:57:05.6890555Z  * [new branch]              gh/XuehaiPan/398/orig       -> origin/gh/XuehaiPan/398/orig
2025-12-04T08:57:05.6892666Z  * [new branch]              gh/XuehaiPan/399/base       -> origin/gh/XuehaiPan/399/base
2025-12-04T08:57:05.6894330Z  * [new branch]              gh/XuehaiPan/399/head       -> origin/gh/XuehaiPan/399/head
2025-12-04T08:57:05.6895958Z  * [new branch]              gh/XuehaiPan/399/orig       -> origin/gh/XuehaiPan/399/orig
2025-12-04T08:57:05.6898168Z  * [new branch]              gh/XuehaiPan/400/base       -> origin/gh/XuehaiPan/400/base
2025-12-04T08:57:05.6899772Z  * [new branch]              gh/XuehaiPan/400/head       -> origin/gh/XuehaiPan/400/head
2025-12-04T08:57:05.6901344Z  * [new branch]              gh/XuehaiPan/400/orig       -> origin/gh/XuehaiPan/400/orig
2025-12-04T08:57:05.6904001Z  * [new branch]              gh/ZhiweiYan-96/39/base     -> origin/gh/ZhiweiYan-96/39/base
2025-12-04T08:57:05.6905600Z  * [new branch]              gh/ZhiweiYan-96/39/head     -> origin/gh/ZhiweiYan-96/39/head
2025-12-04T08:57:05.6907313Z  * [new branch]              gh/ZhiweiYan-96/39/orig     -> origin/gh/ZhiweiYan-96/39/orig
2025-12-04T08:57:05.6909433Z  * [new branch]              gh/ZhiweiYan-96/44/base     -> origin/gh/ZhiweiYan-96/44/base
2025-12-04T08:57:05.6911145Z  * [new branch]              gh/ZhiweiYan-96/44/head     -> origin/gh/ZhiweiYan-96/44/head
2025-12-04T08:57:05.6913210Z  * [new branch]              gh/ZhiweiYan-96/45/base     -> origin/gh/ZhiweiYan-96/45/base
2025-12-04T08:57:05.6914698Z  * [new branch]              gh/ZhiweiYan-96/45/head     -> origin/gh/ZhiweiYan-96/45/head
2025-12-04T08:57:05.6916898Z  * [new branch]              gh/ZhiweiYan-96/49/base     -> origin/gh/ZhiweiYan-96/49/base
2025-12-04T08:57:05.6918817Z  * [new branch]              gh/ZhiweiYan-96/49/head     -> origin/gh/ZhiweiYan-96/49/head
2025-12-04T08:57:05.6921143Z  * [new branch]              gh/ZhiweiYan-96/62/base     -> origin/gh/ZhiweiYan-96/62/base
2025-12-04T08:57:05.6922662Z  * [new branch]              gh/ZhiweiYan-96/62/head     -> origin/gh/ZhiweiYan-96/62/head
2025-12-04T08:57:05.6924816Z  * [new branch]              gh/ZhiweiYan-96/66/base     -> origin/gh/ZhiweiYan-96/66/base
2025-12-04T08:57:05.6926381Z  * [new branch]              gh/ZhiweiYan-96/66/head     -> origin/gh/ZhiweiYan-96/66/head
2025-12-04T08:57:05.6928501Z  * [new branch]              gh/ZhiweiYan-96/67/base     -> origin/gh/ZhiweiYan-96/67/base
2025-12-04T08:57:05.6930048Z  * [new branch]              gh/ZhiweiYan-96/67/head     -> origin/gh/ZhiweiYan-96/67/head
2025-12-04T08:57:05.6932097Z  * [new branch]              gh/ZhiweiYan-96/68/base     -> origin/gh/ZhiweiYan-96/68/base
2025-12-04T08:57:05.6933632Z  * [new branch]              gh/ZhiweiYan-96/68/head     -> origin/gh/ZhiweiYan-96/68/head
2025-12-04T08:57:05.6935212Z  * [new branch]              gh/ZhiweiYan-96/68/orig     -> origin/gh/ZhiweiYan-96/68/orig
2025-12-04T08:57:05.6937800Z  * [new branch]              gh/aakhundov/1/base         -> origin/gh/aakhundov/1/base
2025-12-04T08:57:05.6939474Z  * [new branch]              gh/aakhundov/1/head         -> origin/gh/aakhundov/1/head
2025-12-04T08:57:05.6941496Z  * [new branch]              gh/aakhundov/2/base         -> origin/gh/aakhundov/2/base
2025-12-04T08:57:05.6943035Z  * [new branch]              gh/aakhundov/2/head         -> origin/gh/aakhundov/2/head
2025-12-04T08:57:05.6945185Z  * [new branch]              gh/aditew01/openblas        -> origin/gh/aditew01/openblas
2025-12-04T08:57:05.6946836Z  * [new branch]              gh/aditew01/sbgemm          -> origin/gh/aditew01/sbgemm
2025-12-04T08:57:05.6948378Z  * [new branch]              gh/aditew01/vecbf16         -> origin/gh/aditew01/vecbf16
2025-12-04T08:57:05.6950912Z  * [new branch]              gh/albanD/4/base            -> origin/gh/albanD/4/base
2025-12-04T08:57:05.6952553Z  * [new branch]              gh/albanD/4/head            -> origin/gh/albanD/4/head
2025-12-04T08:57:05.6954159Z  * [new branch]              gh/albanD/4/orig            -> origin/gh/albanD/4/orig
2025-12-04T08:57:05.6956531Z  * [new branch]              gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init
2025-12-04T08:57:05.6958988Z  * [new branch]              gh/alexsamardzic/12/base    -> origin/gh/alexsamardzic/12/base
2025-12-04T08:57:05.6960675Z  * [new branch]              gh/alexsamardzic/12/head    -> origin/gh/alexsamardzic/12/head
2025-12-04T08:57:05.6962318Z  * [new branch]              gh/alexsamardzic/12/orig    -> origin/gh/alexsamardzic/12/orig
2025-12-04T08:57:05.6964446Z  * [new branch]              gh/alexsamardzic/14/base    -> origin/gh/alexsamardzic/14/base
2025-12-04T08:57:05.6966080Z  * [new branch]              gh/alexsamardzic/14/head    -> origin/gh/alexsamardzic/14/head
2025-12-04T08:57:05.6967667Z  * [new branch]              gh/alexsamardzic/14/orig    -> origin/gh/alexsamardzic/14/orig
2025-12-04T08:57:05.6969852Z  * [new branch]              gh/alexsamardzic/15/base    -> origin/gh/alexsamardzic/15/base
2025-12-04T08:57:05.6971385Z  * [new branch]              gh/alexsamardzic/15/head    -> origin/gh/alexsamardzic/15/head
2025-12-04T08:57:05.6972967Z  * [new branch]              gh/alexsamardzic/15/orig    -> origin/gh/alexsamardzic/15/orig
2025-12-04T08:57:05.6975767Z  * [new branch]              gh/amjames/18/base          -> origin/gh/amjames/18/base
2025-12-04T08:57:05.6977254Z  * [new branch]              gh/amjames/18/head          -> origin/gh/amjames/18/head
2025-12-04T08:57:05.6978835Z  * [new branch]              gh/amjames/18/orig          -> origin/gh/amjames/18/orig
2025-12-04T08:57:05.6981506Z  * [new branch]              gh/andrewor14/35/base       -> origin/gh/andrewor14/35/base
2025-12-04T08:57:05.6983268Z  * [new branch]              gh/andrewor14/35/head       -> origin/gh/andrewor14/35/head
2025-12-04T08:57:05.6984868Z  * [new branch]              gh/andrewor14/35/orig       -> origin/gh/andrewor14/35/orig
2025-12-04T08:57:05.6987216Z  * [new branch]              gh/andrewor14/50/base       -> origin/gh/andrewor14/50/base
2025-12-04T08:57:05.6988886Z  * [new branch]              gh/andrewor14/50/head       -> origin/gh/andrewor14/50/head
2025-12-04T08:57:05.6990577Z  * [new branch]              gh/andrewor14/50/orig       -> origin/gh/andrewor14/50/orig
2025-12-04T08:57:05.6993158Z  * [new branch]              gh/andyanwang/30/base       -> origin/gh/andyanwang/30/base
2025-12-04T08:57:05.6994914Z  * [new branch]              gh/andyanwang/30/orig       -> origin/gh/andyanwang/30/orig
2025-12-04T08:57:05.6997124Z  * [new branch]              gh/andyanwang/31/base       -> origin/gh/andyanwang/31/base
2025-12-04T08:57:05.6998821Z  * [new branch]              gh/andyanwang/31/orig       -> origin/gh/andyanwang/31/orig
2025-12-04T08:57:05.7001126Z  * [new branch]              gh/andyanwang/39/base       -> origin/gh/andyanwang/39/base
2025-12-04T08:57:05.7002758Z  * [new branch]              gh/andyanwang/39/head       -> origin/gh/andyanwang/39/head
2025-12-04T08:57:05.7004351Z  * [new branch]              gh/andyanwang/39/orig       -> origin/gh/andyanwang/39/orig
2025-12-04T08:57:05.7006669Z  * [new branch]              gh/andyanwang/42/base       -> origin/gh/andyanwang/42/base
2025-12-04T08:57:05.7008197Z  * [new branch]              gh/andyanwang/42/head       -> origin/gh/andyanwang/42/head
2025-12-04T08:57:05.7009832Z  * [new branch]              gh/andyanwang/42/orig       -> origin/gh/andyanwang/42/orig
2025-12-04T08:57:05.7012119Z  * [new branch]              gh/andyanwang/45/base       -> origin/gh/andyanwang/45/base
2025-12-04T08:57:05.7013776Z  * [new branch]              gh/andyanwang/45/head       -> origin/gh/andyanwang/45/head
2025-12-04T08:57:05.7015376Z  * [new branch]              gh/andyanwang/45/orig       -> origin/gh/andyanwang/45/orig
2025-12-04T08:57:05.7018134Z  * [new branch]              gh/angelayi/107/base        -> origin/gh/angelayi/107/base
2025-12-04T08:57:05.7019739Z  * [new branch]              gh/angelayi/107/head        -> origin/gh/angelayi/107/head
2025-12-04T08:57:05.7021911Z  * [new branch]              gh/angelayi/114/base        -> origin/gh/angelayi/114/base
2025-12-04T08:57:05.7023579Z  * [new branch]              gh/angelayi/114/head        -> origin/gh/angelayi/114/head
2025-12-04T08:57:05.7025151Z  * [new branch]              gh/angelayi/114/orig        -> origin/gh/angelayi/114/orig
2025-12-04T08:57:05.7027280Z  * [new branch]              gh/angelayi/116/base        -> origin/gh/angelayi/116/base
2025-12-04T08:57:05.7028888Z  * [new branch]              gh/angelayi/116/head        -> origin/gh/angelayi/116/head
2025-12-04T08:57:05.7030426Z  * [new branch]              gh/angelayi/116/orig        -> origin/gh/angelayi/116/orig
2025-12-04T08:57:05.7032672Z  * [new branch]              gh/angelayi/122/base        -> origin/gh/angelayi/122/base
2025-12-04T08:57:05.7034172Z  * [new branch]              gh/angelayi/122/head        -> origin/gh/angelayi/122/head
2025-12-04T08:57:05.7035804Z  * [new branch]              gh/angelayi/122/orig        -> origin/gh/angelayi/122/orig
2025-12-04T08:57:05.7038038Z  * [new branch]              gh/angelayi/124/base        -> origin/gh/angelayi/124/base
2025-12-04T08:57:05.7039605Z  * [new branch]              gh/angelayi/124/head        -> origin/gh/angelayi/124/head
2025-12-04T08:57:05.7041728Z  * [new branch]              gh/angelayi/124/orig        -> origin/gh/angelayi/124/orig
2025-12-04T08:57:05.7043655Z  * [new branch]              gh/angelayi/128/base        -> origin/gh/angelayi/128/base
2025-12-04T08:57:05.7045222Z  * [new branch]              gh/angelayi/128/head        -> origin/gh/angelayi/128/head
2025-12-04T08:57:05.7046793Z  * [new branch]              gh/angelayi/128/orig        -> origin/gh/angelayi/128/orig
2025-12-04T08:57:05.7048923Z  * [new branch]              gh/angelayi/131/base        -> origin/gh/angelayi/131/base
2025-12-04T08:57:05.7050500Z  * [new branch]              gh/angelayi/131/head        -> origin/gh/angelayi/131/head
2025-12-04T08:57:05.7052107Z  * [new branch]              gh/angelayi/131/orig        -> origin/gh/angelayi/131/orig
2025-12-04T08:57:05.7054500Z  * [new branch]              gh/angelayi/132/base        -> origin/gh/angelayi/132/base
2025-12-04T08:57:05.7056353Z  * [new branch]              gh/angelayi/132/head        -> origin/gh/angelayi/132/head
2025-12-04T08:57:05.7058078Z  * [new branch]              gh/angelayi/132/orig        -> origin/gh/angelayi/132/orig
2025-12-04T08:57:05.7060226Z  * [new branch]              gh/angelayi/133/base        -> origin/gh/angelayi/133/base
2025-12-04T08:57:05.7061841Z  * [new branch]              gh/angelayi/133/head        -> origin/gh/angelayi/133/head
2025-12-04T08:57:05.7063460Z  * [new branch]              gh/angelayi/133/orig        -> origin/gh/angelayi/133/orig
2025-12-04T08:57:05.7065791Z  * [new branch]              gh/angelayi/134/base        -> origin/gh/angelayi/134/base
2025-12-04T08:57:05.7067572Z  * [new branch]              gh/angelayi/134/head        -> origin/gh/angelayi/134/head
2025-12-04T08:57:05.7069213Z  * [new branch]              gh/angelayi/134/orig        -> origin/gh/angelayi/134/orig
2025-12-04T08:57:05.7071477Z  * [new branch]              gh/angelayi/135/base        -> origin/gh/angelayi/135/base
2025-12-04T08:57:05.7073159Z  * [new branch]              gh/angelayi/135/head        -> origin/gh/angelayi/135/head
2025-12-04T08:57:05.7074739Z  * [new branch]              gh/angelayi/135/orig        -> origin/gh/angelayi/135/orig
2025-12-04T08:57:05.7076846Z  * [new branch]              gh/angelayi/136/base        -> origin/gh/angelayi/136/base
2025-12-04T08:57:05.7078399Z  * [new branch]              gh/angelayi/136/head        -> origin/gh/angelayi/136/head
2025-12-04T08:57:05.7080086Z  * [new branch]              gh/angelayi/136/orig        -> origin/gh/angelayi/136/orig
2025-12-04T08:57:05.7082340Z  * [new branch]              gh/angelayi/137/base        -> origin/gh/angelayi/137/base
2025-12-04T08:57:05.7083850Z  * [new branch]              gh/angelayi/137/head        -> origin/gh/angelayi/137/head
2025-12-04T08:57:05.7085603Z  * [new branch]              gh/angelayi/137/orig        -> origin/gh/angelayi/137/orig
2025-12-04T08:57:05.7087638Z  * [new branch]              gh/angelayi/138/base        -> origin/gh/angelayi/138/base
2025-12-04T08:57:05.7089208Z  * [new branch]              gh/angelayi/138/head        -> origin/gh/angelayi/138/head
2025-12-04T08:57:05.7090742Z  * [new branch]              gh/angelayi/138/orig        -> origin/gh/angelayi/138/orig
2025-12-04T08:57:05.7092855Z  * [new branch]              gh/angelayi/139/base        -> origin/gh/angelayi/139/base
2025-12-04T08:57:05.7094494Z  * [new branch]              gh/angelayi/139/head        -> origin/gh/angelayi/139/head
2025-12-04T08:57:05.7096091Z  * [new branch]              gh/angelayi/139/orig        -> origin/gh/angelayi/139/orig
2025-12-04T08:57:05.7098308Z  * [new branch]              gh/angelayi/140/base        -> origin/gh/angelayi/140/base
2025-12-04T08:57:05.7099977Z  * [new branch]              gh/angelayi/140/head        -> origin/gh/angelayi/140/head
2025-12-04T08:57:05.7101663Z  * [new branch]              gh/angelayi/140/orig        -> origin/gh/angelayi/140/orig
2025-12-04T08:57:05.7104295Z  * [new branch]              gh/angelayi/141/base        -> origin/gh/angelayi/141/base
2025-12-04T08:57:05.7106037Z  * [new branch]              gh/angelayi/141/head        -> origin/gh/angelayi/141/head
2025-12-04T08:57:05.7107646Z  * [new branch]              gh/angelayi/141/orig        -> origin/gh/angelayi/141/orig
2025-12-04T08:57:05.7109821Z  * [new branch]              gh/angelayi/142/base        -> origin/gh/angelayi/142/base
2025-12-04T08:57:05.7111445Z  * [new branch]              gh/angelayi/142/head        -> origin/gh/angelayi/142/head
2025-12-04T08:57:05.7113041Z  * [new branch]              gh/angelayi/142/orig        -> origin/gh/angelayi/142/orig
2025-12-04T08:57:05.7115257Z  * [new branch]              gh/angelayi/143/base        -> origin/gh/angelayi/143/base
2025-12-04T08:57:05.7116824Z  * [new branch]              gh/angelayi/143/head        -> origin/gh/angelayi/143/head
2025-12-04T08:57:05.7120858Z  * [new branch]              gh/angelayi/143/orig        -> origin/gh/angelayi/143/orig
2025-12-04T08:57:05.7123057Z  * [new branch]              gh/angelayi/144/base        -> origin/gh/angelayi/144/base
2025-12-04T08:57:05.7124701Z  * [new branch]              gh/angelayi/144/head        -> origin/gh/angelayi/144/head
2025-12-04T08:57:05.7126361Z  * [new branch]              gh/angelayi/144/orig        -> origin/gh/angelayi/144/orig
2025-12-04T08:57:05.7129128Z  * [new branch]              gh/anijain2305/753/base     -> origin/gh/anijain2305/753/base
2025-12-04T08:57:05.7130870Z  * [new branch]              gh/anijain2305/753/head     -> origin/gh/anijain2305/753/head
2025-12-04T08:57:05.7132368Z  * [new branch]              gh/anijain2305/753/orig     -> origin/gh/anijain2305/753/orig
2025-12-04T08:57:05.7134591Z  * [new branch]              gh/anijain2305/810/base     -> origin/gh/anijain2305/810/base
2025-12-04T08:57:05.7136207Z  * [new branch]              gh/anijain2305/810/head     -> origin/gh/anijain2305/810/head
2025-12-04T08:57:05.7137804Z  * [new branch]              gh/anijain2305/810/orig     -> origin/gh/anijain2305/810/orig
2025-12-04T08:57:05.7140017Z  * [new branch]              gh/anijain2305/854/base     -> origin/gh/anijain2305/854/base
2025-12-04T08:57:05.7141661Z  * [new branch]              gh/anijain2305/854/head     -> origin/gh/anijain2305/854/head
2025-12-04T08:57:05.7143229Z  * [new branch]              gh/anijain2305/854/orig     -> origin/gh/anijain2305/854/orig
2025-12-04T08:57:05.7145470Z  * [new branch]              gh/anijain2305/864/base     -> origin/gh/anijain2305/864/base
2025-12-04T08:57:05.7147072Z  * [new branch]              gh/anijain2305/864/head     -> origin/gh/anijain2305/864/head
2025-12-04T08:57:05.7148675Z  * [new branch]              gh/anijain2305/864/orig     -> origin/gh/anijain2305/864/orig
2025-12-04T08:57:05.7150854Z  * [new branch]              gh/anijain2305/870/base     -> origin/gh/anijain2305/870/base
2025-12-04T08:57:05.7152431Z  * [new branch]              gh/anijain2305/870/head     -> origin/gh/anijain2305/870/head
2025-12-04T08:57:05.7154064Z  * [new branch]              gh/anijain2305/870/orig     -> origin/gh/anijain2305/870/orig
2025-12-04T08:57:05.7156369Z  * [new branch]              gh/anijain2305/873/base     -> origin/gh/anijain2305/873/base
2025-12-04T08:57:05.7157851Z  * [new branch]              gh/anijain2305/873/head     -> origin/gh/anijain2305/873/head
2025-12-04T08:57:05.7159392Z  * [new branch]              gh/anijain2305/873/orig     -> origin/gh/anijain2305/873/orig
2025-12-04T08:57:05.7161697Z  * [new branch]              gh/anijain2305/894/base     -> origin/gh/anijain2305/894/base
2025-12-04T08:57:05.7163331Z  * [new branch]              gh/anijain2305/894/head     -> origin/gh/anijain2305/894/head
2025-12-04T08:57:05.7164989Z  * [new branch]              gh/anijain2305/894/orig     -> origin/gh/anijain2305/894/orig
2025-12-04T08:57:05.7167175Z  * [new branch]              gh/anijain2305/895/base     -> origin/gh/anijain2305/895/base
2025-12-04T08:57:05.7168763Z  * [new branch]              gh/anijain2305/895/head     -> origin/gh/anijain2305/895/head
2025-12-04T08:57:05.7170354Z  * [new branch]              gh/anijain2305/895/orig     -> origin/gh/anijain2305/895/orig
2025-12-04T08:57:05.7172663Z  * [new branch]              gh/anijain2305/910/base     -> origin/gh/anijain2305/910/base
2025-12-04T08:57:05.7174191Z  * [new branch]              gh/anijain2305/910/head     -> origin/gh/anijain2305/910/head
2025-12-04T08:57:05.7175714Z  * [new branch]              gh/anijain2305/910/orig     -> origin/gh/anijain2305/910/orig
2025-12-04T08:57:05.7177955Z  * [new branch]              gh/anijain2305/919/base     -> origin/gh/anijain2305/919/base
2025-12-04T08:57:05.7179600Z  * [new branch]              gh/anijain2305/919/head     -> origin/gh/anijain2305/919/head
2025-12-04T08:57:05.7181213Z  * [new branch]              gh/anijain2305/919/orig     -> origin/gh/anijain2305/919/orig
2025-12-04T08:57:05.7183360Z  * [new branch]              gh/anijain2305/922/base     -> origin/gh/anijain2305/922/base
2025-12-04T08:57:05.7185017Z  * [new branch]              gh/anijain2305/922/head     -> origin/gh/anijain2305/922/head
2025-12-04T08:57:05.7186670Z  * [new branch]              gh/anijain2305/922/orig     -> origin/gh/anijain2305/922/orig
2025-12-04T08:57:05.7188834Z  * [new branch]              gh/anijain2305/932/base     -> origin/gh/anijain2305/932/base
2025-12-04T08:57:05.7190572Z  * [new branch]              gh/anijain2305/932/head     -> origin/gh/anijain2305/932/head
2025-12-04T08:57:05.7192252Z  * [new branch]              gh/anijain2305/932/orig     -> origin/gh/anijain2305/932/orig
2025-12-04T08:57:05.7194382Z  * [new branch]              gh/anijain2305/940/base     -> origin/gh/anijain2305/940/base
2025-12-04T08:57:05.7196003Z  * [new branch]              gh/anijain2305/940/head     -> origin/gh/anijain2305/940/head
2025-12-04T08:57:05.7197577Z  * [new branch]              gh/anijain2305/940/orig     -> origin/gh/anijain2305/940/orig
2025-12-04T08:57:05.7199716Z  * [new branch]              gh/anijain2305/941/base     -> origin/gh/anijain2305/941/base
2025-12-04T08:57:05.7201432Z  * [new branch]              gh/anijain2305/941/head     -> origin/gh/anijain2305/941/head
2025-12-04T08:57:05.7203046Z  * [new branch]              gh/anijain2305/941/orig     -> origin/gh/anijain2305/941/orig
2025-12-04T08:57:05.7205232Z  * [new branch]              gh/anijain2305/942/base     -> origin/gh/anijain2305/942/base
2025-12-04T08:57:05.7206848Z  * [new branch]              gh/anijain2305/942/head     -> origin/gh/anijain2305/942/head
2025-12-04T08:57:05.7208502Z  * [new branch]              gh/anijain2305/942/orig     -> origin/gh/anijain2305/942/orig
2025-12-04T08:57:05.7210692Z  * [new branch]              gh/anijain2305/943/base     -> origin/gh/anijain2305/943/base
2025-12-04T08:57:05.7212334Z  * [new branch]              gh/anijain2305/943/head     -> origin/gh/anijain2305/943/head
2025-12-04T08:57:05.7213918Z  * [new branch]              gh/anijain2305/943/orig     -> origin/gh/anijain2305/943/orig
2025-12-04T08:57:05.7216625Z  * [new branch]              gh/anijain2305/944/base     -> origin/gh/anijain2305/944/base
2025-12-04T08:57:05.7218440Z  * [new branch]              gh/anijain2305/944/head     -> origin/gh/anijain2305/944/head
2025-12-04T08:57:05.7219999Z  * [new branch]              gh/anijain2305/944/orig     -> origin/gh/anijain2305/944/orig
2025-12-04T08:57:05.7222660Z  * [new branch]              gh/anijain2305/945/base     -> origin/gh/anijain2305/945/base
2025-12-04T08:57:05.7224329Z  * [new branch]              gh/anijain2305/945/head     -> origin/gh/anijain2305/945/head
2025-12-04T08:57:05.7225902Z  * [new branch]              gh/anijain2305/945/orig     -> origin/gh/anijain2305/945/orig
2025-12-04T08:57:05.7228098Z  * [new branch]              gh/anijain2305/946/base     -> origin/gh/anijain2305/946/base
2025-12-04T08:57:05.7229675Z  * [new branch]              gh/anijain2305/946/head     -> origin/gh/anijain2305/946/head
2025-12-04T08:57:05.7231309Z  * [new branch]              gh/anijain2305/946/orig     -> origin/gh/anijain2305/946/orig
2025-12-04T08:57:05.7233495Z  * [new branch]              gh/anijain2305/947/base     -> origin/gh/anijain2305/947/base
2025-12-04T08:57:05.7235303Z  * [new branch]              gh/anijain2305/947/head     -> origin/gh/anijain2305/947/head
2025-12-04T08:57:05.7236968Z  * [new branch]              gh/anijain2305/947/orig     -> origin/gh/anijain2305/947/orig
2025-12-04T08:57:05.7239396Z  * [new branch]              gh/anijain2305/948/base     -> origin/gh/anijain2305/948/base
2025-12-04T08:57:05.7241092Z  * [new branch]              gh/anijain2305/948/head     -> origin/gh/anijain2305/948/head
2025-12-04T08:57:05.7242666Z  * [new branch]              gh/anijain2305/948/orig     -> origin/gh/anijain2305/948/orig
2025-12-04T08:57:05.7244948Z  * [new branch]              gh/anijain2305/949/base     -> origin/gh/anijain2305/949/base
2025-12-04T08:57:05.7246614Z  * [new branch]              gh/anijain2305/949/head     -> origin/gh/anijain2305/949/head
2025-12-04T08:57:05.7248331Z  * [new branch]              gh/anijain2305/949/orig     -> origin/gh/anijain2305/949/orig
2025-12-04T08:57:05.7250561Z  * [new branch]              gh/anijain2305/950/base     -> origin/gh/anijain2305/950/base
2025-12-04T08:57:05.7252147Z  * [new branch]              gh/anijain2305/950/head     -> origin/gh/anijain2305/950/head
2025-12-04T08:57:05.7253705Z  * [new branch]              gh/anijain2305/950/orig     -> origin/gh/anijain2305/950/orig
2025-12-04T08:57:05.7256079Z  * [new branch]              gh/anijain2305/951/base     -> origin/gh/anijain2305/951/base
2025-12-04T08:57:05.7257630Z  * [new branch]              gh/anijain2305/951/head     -> origin/gh/anijain2305/951/head
2025-12-04T08:57:05.7259219Z  * [new branch]              gh/anijain2305/951/orig     -> origin/gh/anijain2305/951/orig
2025-12-04T08:57:05.7261502Z  * [new branch]              gh/anijain2305/952/base     -> origin/gh/anijain2305/952/base
2025-12-04T08:57:05.7263097Z  * [new branch]              gh/anijain2305/952/head     -> origin/gh/anijain2305/952/head
2025-12-04T08:57:05.7264696Z  * [new branch]              gh/anijain2305/952/orig     -> origin/gh/anijain2305/952/orig
2025-12-04T08:57:05.7266926Z  * [new branch]              gh/anijain2305/953/base     -> origin/gh/anijain2305/953/base
2025-12-04T08:57:05.7268516Z  * [new branch]              gh/anijain2305/953/head     -> origin/gh/anijain2305/953/head
2025-12-04T08:57:05.7270148Z  * [new branch]              gh/anijain2305/953/orig     -> origin/gh/anijain2305/953/orig
2025-12-04T08:57:05.7272376Z  * [new branch]              gh/anijain2305/954/base     -> origin/gh/anijain2305/954/base
2025-12-04T08:57:05.7274041Z  * [new branch]              gh/anijain2305/954/head     -> origin/gh/anijain2305/954/head
2025-12-04T08:57:05.7275592Z  * [new branch]              gh/anijain2305/954/orig     -> origin/gh/anijain2305/954/orig
2025-12-04T08:57:05.7277928Z  * [new branch]              gh/anijain2305/955/base     -> origin/gh/anijain2305/955/base
2025-12-04T08:57:05.7279556Z  * [new branch]              gh/anijain2305/955/head     -> origin/gh/anijain2305/955/head
2025-12-04T08:57:05.7281293Z  * [new branch]              gh/anijain2305/955/orig     -> origin/gh/anijain2305/955/orig
2025-12-04T08:57:05.7283560Z  * [new branch]              gh/anijain2305/956/base     -> origin/gh/anijain2305/956/base
2025-12-04T08:57:05.7285147Z  * [new branch]              gh/anijain2305/956/head     -> origin/gh/anijain2305/956/head
2025-12-04T08:57:05.7286761Z  * [new branch]              gh/anijain2305/956/orig     -> origin/gh/anijain2305/956/orig
2025-12-04T08:57:05.7289059Z  * [new branch]              gh/anijain2305/957/base     -> origin/gh/anijain2305/957/base
2025-12-04T08:57:05.7290656Z  * [new branch]              gh/anijain2305/957/head     -> origin/gh/anijain2305/957/head
2025-12-04T08:57:05.7292250Z  * [new branch]              gh/anijain2305/957/orig     -> origin/gh/anijain2305/957/orig
2025-12-04T08:57:05.7294517Z  * [new branch]              gh/anijain2305/958/base     -> origin/gh/anijain2305/958/base
2025-12-04T08:57:05.7296177Z  * [new branch]              gh/anijain2305/958/head     -> origin/gh/anijain2305/958/head
2025-12-04T08:57:05.7297853Z  * [new branch]              gh/anijain2305/958/orig     -> origin/gh/anijain2305/958/orig
2025-12-04T08:57:05.7300063Z  * [new branch]              gh/anijain2305/959/base     -> origin/gh/anijain2305/959/base
2025-12-04T08:57:05.7301729Z  * [new branch]              gh/anijain2305/959/head     -> origin/gh/anijain2305/959/head
2025-12-04T08:57:05.7303368Z  * [new branch]              gh/anijain2305/959/orig     -> origin/gh/anijain2305/959/orig
2025-12-04T08:57:05.7305569Z  * [new branch]              gh/anijain2305/960/base     -> origin/gh/anijain2305/960/base
2025-12-04T08:57:05.7307237Z  * [new branch]              gh/anijain2305/960/head     -> origin/gh/anijain2305/960/head
2025-12-04T08:57:05.7308816Z  * [new branch]              gh/anijain2305/960/orig     -> origin/gh/anijain2305/960/orig
2025-12-04T08:57:05.7311081Z  * [new branch]              gh/anijain2305/961/base     -> origin/gh/anijain2305/961/base
2025-12-04T08:57:05.7312722Z  * [new branch]              gh/anijain2305/961/head     -> origin/gh/anijain2305/961/head
2025-12-04T08:57:05.7314319Z  * [new branch]              gh/anijain2305/961/orig     -> origin/gh/anijain2305/961/orig
2025-12-04T08:57:05.7316561Z  * [new branch]              gh/anijain2305/962/base     -> origin/gh/anijain2305/962/base
2025-12-04T08:57:05.7318349Z  * [new branch]              gh/anijain2305/962/head     -> origin/gh/anijain2305/962/head
2025-12-04T08:57:05.7319998Z  * [new branch]              gh/anijain2305/962/orig     -> origin/gh/anijain2305/962/orig
2025-12-04T08:57:05.7322616Z  * [new branch]              gh/anijain2305/963/base     -> origin/gh/anijain2305/963/base
2025-12-04T08:57:05.7324313Z  * [new branch]              gh/anijain2305/963/head     -> origin/gh/anijain2305/963/head
2025-12-04T08:57:05.7325953Z  * [new branch]              gh/anijain2305/963/orig     -> origin/gh/anijain2305/963/orig
2025-12-04T08:57:05.7328275Z  * [new branch]              gh/anijain2305/964/base     -> origin/gh/anijain2305/964/base
2025-12-04T08:57:05.7329953Z  * [new branch]              gh/anijain2305/964/head     -> origin/gh/anijain2305/964/head
2025-12-04T08:57:05.7331532Z  * [new branch]              gh/anijain2305/964/orig     -> origin/gh/anijain2305/964/orig
2025-12-04T08:57:05.7333727Z  * [new branch]              gh/anijain2305/965/base     -> origin/gh/anijain2305/965/base
2025-12-04T08:57:05.7335357Z  * [new branch]              gh/anijain2305/965/head     -> origin/gh/anijain2305/965/head
2025-12-04T08:57:05.7337011Z  * [new branch]              gh/anijain2305/965/orig     -> origin/gh/anijain2305/965/orig
2025-12-04T08:57:05.7339901Z  * [new branch]              gh/anijain2305/966/base     -> origin/gh/anijain2305/966/base
2025-12-04T08:57:05.7342201Z  * [new branch]              gh/anijain2305/966/head     -> origin/gh/anijain2305/966/head
2025-12-04T08:57:05.7344607Z  * [new branch]              gh/anijain2305/966/orig     -> origin/gh/anijain2305/966/orig
2025-12-04T08:57:05.7348056Z  * [new branch]              gh/anijain2305/967/base     -> origin/gh/anijain2305/967/base
2025-12-04T08:57:05.7350522Z  * [new branch]              gh/anijain2305/967/head     -> origin/gh/anijain2305/967/head
2025-12-04T08:57:05.7352780Z  * [new branch]              gh/anijain2305/967/orig     -> origin/gh/anijain2305/967/orig
2025-12-04T08:57:05.7355784Z  * [new branch]              gh/anijain2305/968/base     -> origin/gh/anijain2305/968/base
2025-12-04T08:57:05.7358154Z  * [new branch]              gh/anijain2305/968/head     -> origin/gh/anijain2305/968/head
2025-12-04T08:57:05.7360195Z  * [new branch]              gh/anijain2305/968/orig     -> origin/gh/anijain2305/968/orig
2025-12-04T08:57:05.7363358Z  * [new branch]              gh/anijain2305/969/base     -> origin/gh/anijain2305/969/base
2025-12-04T08:57:05.7365658Z  * [new branch]              gh/anijain2305/969/head     -> origin/gh/anijain2305/969/head
2025-12-04T08:57:05.7367962Z  * [new branch]              gh/anijain2305/969/orig     -> origin/gh/anijain2305/969/orig
2025-12-04T08:57:05.7371298Z  * [new branch]              gh/anijain2305/970/base     -> origin/gh/anijain2305/970/base
2025-12-04T08:57:05.7373418Z  * [new branch]              gh/anijain2305/970/head     -> origin/gh/anijain2305/970/head
2025-12-04T08:57:05.7375720Z  * [new branch]              gh/anijain2305/970/orig     -> origin/gh/anijain2305/970/orig
2025-12-04T08:57:05.7379485Z  * [new branch]              gh/anjali411/216/base       -> origin/gh/anjali411/216/base
2025-12-04T08:57:05.7380945Z  * [new branch]              gh/anjali411/216/head       -> origin/gh/anjali411/216/head
2025-12-04T08:57:05.7382523Z  * [new branch]              gh/anjali411/216/orig       -> origin/gh/anjali411/216/orig
2025-12-04T08:57:05.7385323Z  * [new branch]              gh/anshul-si/1/base         -> origin/gh/anshul-si/1/base
2025-12-04T08:57:05.7386970Z  * [new branch]              gh/anshul-si/1/head         -> origin/gh/anshul-si/1/head
2025-12-04T08:57:05.7388965Z  * [new branch]              gh/anshul-si/2/base         -> origin/gh/anshul-si/2/base
2025-12-04T08:57:05.7390560Z  * [new branch]              gh/anshul-si/2/head         -> origin/gh/anshul-si/2/head
2025-12-04T08:57:05.7392507Z  * [new branch]              gh/anshul-si/3/base         -> origin/gh/anshul-si/3/base
2025-12-04T08:57:05.7394078Z  * [new branch]              gh/anshul-si/3/head         -> origin/gh/anshul-si/3/head
2025-12-04T08:57:05.7396162Z  * [new branch]              gh/anshul-si/4/base         -> origin/gh/anshul-si/4/base
2025-12-04T08:57:05.7397720Z  * [new branch]              gh/anshul-si/4/head         -> origin/gh/anshul-si/4/head
2025-12-04T08:57:05.7399685Z  * [new branch]              gh/anshul-si/5/base         -> origin/gh/anshul-si/5/base
2025-12-04T08:57:05.7401352Z  * [new branch]              gh/anshul-si/5/head         -> origin/gh/anshul-si/5/head
2025-12-04T08:57:05.7403639Z  * [new branch]              gh/anshul-si/53/base        -> origin/gh/anshul-si/53/base
2025-12-04T08:57:05.7405233Z  * [new branch]              gh/anshul-si/53/head        -> origin/gh/anshul-si/53/head
2025-12-04T08:57:05.7407417Z  * [new branch]              gh/anshul-si/58/base        -> origin/gh/anshul-si/58/base
2025-12-04T08:57:05.7409106Z  * [new branch]              gh/anshul-si/58/head        -> origin/gh/anshul-si/58/head
2025-12-04T08:57:05.7411126Z  * [new branch]              gh/anshul-si/66/base        -> origin/gh/anshul-si/66/base
2025-12-04T08:57:05.7412711Z  * [new branch]              gh/anshul-si/66/head        -> origin/gh/anshul-si/66/head
2025-12-04T08:57:05.7414308Z  * [new branch]              gh/anshul-si/66/orig        -> origin/gh/anshul-si/66/orig
2025-12-04T08:57:05.7416401Z  * [new branch]              gh/anshul-si/67/base        -> origin/gh/anshul-si/67/base
2025-12-04T08:57:05.7418190Z  * [new branch]              gh/anshul-si/67/head        -> origin/gh/anshul-si/67/head
2025-12-04T08:57:05.7419803Z  * [new branch]              gh/anshul-si/67/orig        -> origin/gh/anshul-si/67/orig
2025-12-04T08:57:05.7422103Z  * [new branch]              gh/anshul-si/68/base        -> origin/gh/anshul-si/68/base
2025-12-04T08:57:05.7423687Z  * [new branch]              gh/anshul-si/68/head        -> origin/gh/anshul-si/68/head
2025-12-04T08:57:05.7425273Z  * [new branch]              gh/anshul-si/68/orig        -> origin/gh/anshul-si/68/orig
2025-12-04T08:57:05.7427632Z  * [new branch]              gh/anshul-si/69/base        -> origin/gh/anshul-si/69/base
2025-12-04T08:57:05.7429219Z  * [new branch]              gh/anshul-si/69/head        -> origin/gh/anshul-si/69/head
2025-12-04T08:57:05.7430818Z  * [new branch]              gh/anshul-si/69/orig        -> origin/gh/anshul-si/69/orig
2025-12-04T08:57:05.7432925Z  * [new branch]              gh/anshul-si/70/base        -> origin/gh/anshul-si/70/base
2025-12-04T08:57:05.7434577Z  * [new branch]              gh/anshul-si/70/head        -> origin/gh/anshul-si/70/head
2025-12-04T08:57:05.7436209Z  * [new branch]              gh/anshul-si/70/orig        -> origin/gh/anshul-si/70/orig
2025-12-04T08:57:05.7438563Z  * [new branch]              gh/anshul-si/71/base        -> origin/gh/anshul-si/71/base
2025-12-04T08:57:05.7440182Z  * [new branch]              gh/anshul-si/71/head        -> origin/gh/anshul-si/71/head
2025-12-04T08:57:05.7441790Z  * [new branch]              gh/anshul-si/71/orig        -> origin/gh/anshul-si/71/orig
2025-12-04T08:57:05.7443957Z  * [new branch]              gh/anshul-si/72/base        -> origin/gh/anshul-si/72/base
2025-12-04T08:57:05.7445604Z  * [new branch]              gh/anshul-si/72/head        -> origin/gh/anshul-si/72/head
2025-12-04T08:57:05.7447181Z  * [new branch]              gh/anshul-si/72/orig        -> origin/gh/anshul-si/72/orig
2025-12-04T08:57:05.7449473Z  * [new branch]              gh/anshul-si/73/base        -> origin/gh/anshul-si/73/base
2025-12-04T08:57:05.7451106Z  * [new branch]              gh/anshul-si/73/head        -> origin/gh/anshul-si/73/head
2025-12-04T08:57:05.7452697Z  * [new branch]              gh/anshul-si/73/orig        -> origin/gh/anshul-si/73/orig
2025-12-04T08:57:05.7455439Z  * [new branch]              gh/aorenste/132/base        -> origin/gh/aorenste/132/base
2025-12-04T08:57:05.7457028Z  * [new branch]              gh/aorenste/132/head        -> origin/gh/aorenste/132/head
2025-12-04T08:57:05.7459415Z  * [new branch]              gh/aorenste/134/base        -> origin/gh/aorenste/134/base
2025-12-04T08:57:05.7461202Z  * [new branch]              gh/aorenste/134/head        -> origin/gh/aorenste/134/head
2025-12-04T08:57:05.7462865Z  * [new branch]              gh/aorenste/134/orig        -> origin/gh/aorenste/134/orig
2025-12-04T08:57:05.7465075Z  * [new branch]              gh/aorenste/139/base        -> origin/gh/aorenste/139/base
2025-12-04T08:57:05.7466600Z  * [new branch]              gh/aorenste/139/head        -> origin/gh/aorenste/139/head
2025-12-04T08:57:05.7468199Z  * [new branch]              gh/aorenste/139/orig        -> origin/gh/aorenste/139/orig
2025-12-04T08:57:05.7470372Z  * [new branch]              gh/aorenste/141/base        -> origin/gh/aorenste/141/base
2025-12-04T08:57:05.7471946Z  * [new branch]              gh/aorenste/141/head        -> origin/gh/aorenste/141/head
2025-12-04T08:57:05.7474328Z  * [new branch]              gh/aorenste/145/base        -> origin/gh/aorenste/145/base
2025-12-04T08:57:05.7475869Z  * [new branch]              gh/aorenste/145/head        -> origin/gh/aorenste/145/head
2025-12-04T08:57:05.7477633Z  * [new branch]              gh/aorenste/145/orig        -> origin/gh/aorenste/145/orig
2025-12-04T08:57:05.7479826Z  * [new branch]              gh/aorenste/146/base        -> origin/gh/aorenste/146/base
2025-12-04T08:57:05.7481750Z  * [new branch]              gh/aorenste/146/head        -> origin/gh/aorenste/146/head
2025-12-04T08:57:05.7483342Z  * [new branch]              gh/aorenste/146/orig        -> origin/gh/aorenste/146/orig
2025-12-04T08:57:05.7485589Z  * [new branch]              gh/aorenste/147/base        -> origin/gh/aorenste/147/base
2025-12-04T08:57:05.7487359Z  * [new branch]              gh/aorenste/147/head        -> origin/gh/aorenste/147/head
2025-12-04T08:57:05.7488950Z  * [new branch]              gh/aorenste/147/orig        -> origin/gh/aorenste/147/orig
2025-12-04T08:57:05.7491117Z  * [new branch]              gh/aorenste/148/base        -> origin/gh/aorenste/148/base
2025-12-04T08:57:05.7492749Z  * [new branch]              gh/aorenste/148/head        -> origin/gh/aorenste/148/head
2025-12-04T08:57:05.7494428Z  * [new branch]              gh/aorenste/148/orig        -> origin/gh/aorenste/148/orig
2025-12-04T08:57:05.7496588Z  * [new branch]              gh/aorenste/149/base        -> origin/gh/aorenste/149/base
2025-12-04T08:57:05.7498176Z  * [new branch]              gh/aorenste/149/head        -> origin/gh/aorenste/149/head
2025-12-04T08:57:05.7499771Z  * [new branch]              gh/aorenste/149/orig        -> origin/gh/aorenste/149/orig
2025-12-04T08:57:05.7501933Z  * [new branch]              gh/aorenste/150/base        -> origin/gh/aorenste/150/base
2025-12-04T08:57:05.7503682Z  * [new branch]              gh/aorenste/150/head        -> origin/gh/aorenste/150/head
2025-12-04T08:57:05.7505290Z  * [new branch]              gh/aorenste/150/orig        -> origin/gh/aorenste/150/orig
2025-12-04T08:57:05.7507294Z  * [new branch]              gh/aorenste/151/base        -> origin/gh/aorenste/151/base
2025-12-04T08:57:05.7508894Z  * [new branch]              gh/aorenste/151/head        -> origin/gh/aorenste/151/head
2025-12-04T08:57:05.7510528Z  * [new branch]              gh/aorenste/151/orig        -> origin/gh/aorenste/151/orig
2025-12-04T08:57:05.7512676Z  * [new branch]              gh/aorenste/152/base        -> origin/gh/aorenste/152/base
2025-12-04T08:57:05.7514235Z  * [new branch]              gh/aorenste/152/head        -> origin/gh/aorenste/152/head
2025-12-04T08:57:05.7515856Z  * [new branch]              gh/aorenste/152/orig        -> origin/gh/aorenste/152/orig
2025-12-04T08:57:05.7519409Z  * [new branch]              gh/aorenste/153/base        -> origin/gh/aorenste/153/base
2025-12-04T08:57:05.7521188Z  * [new branch]              gh/aorenste/153/head        -> origin/gh/aorenste/153/head
2025-12-04T08:57:05.7522811Z  * [new branch]              gh/aorenste/153/orig        -> origin/gh/aorenste/153/orig
2025-12-04T08:57:05.7524765Z  * [new branch]              gh/aorenste/154/base        -> origin/gh/aorenste/154/base
2025-12-04T08:57:05.7526426Z  * [new branch]              gh/aorenste/154/head        -> origin/gh/aorenste/154/head
2025-12-04T08:57:05.7528211Z  * [new branch]              gh/aorenste/154/orig        -> origin/gh/aorenste/154/orig
2025-12-04T08:57:05.7530010Z  * [new branch]              gh/aorenste/155/base        -> origin/gh/aorenste/155/base
2025-12-04T08:57:05.7531591Z  * [new branch]              gh/aorenste/155/head        -> origin/gh/aorenste/155/head
2025-12-04T08:57:05.7533221Z  * [new branch]              gh/aorenste/155/orig        -> origin/gh/aorenste/155/orig
2025-12-04T08:57:05.7535352Z  * [new branch]              gh/aorenste/156/base        -> origin/gh/aorenste/156/base
2025-12-04T08:57:05.7536856Z  * [new branch]              gh/aorenste/156/head        -> origin/gh/aorenste/156/head
2025-12-04T08:57:05.7538320Z  * [new branch]              gh/aorenste/156/orig        -> origin/gh/aorenste/156/orig
2025-12-04T08:57:05.7540780Z  * [new branch]              gh/aorenste/157/base        -> origin/gh/aorenste/157/base
2025-12-04T08:57:05.7542404Z  * [new branch]              gh/aorenste/157/head        -> origin/gh/aorenste/157/head
2025-12-04T08:57:05.7544010Z  * [new branch]              gh/aorenste/157/orig        -> origin/gh/aorenste/157/orig
2025-12-04T08:57:05.7546176Z  * [new branch]              gh/aorenste/158/base        -> origin/gh/aorenste/158/base
2025-12-04T08:57:05.7547878Z  * [new branch]              gh/aorenste/158/head        -> origin/gh/aorenste/158/head
2025-12-04T08:57:05.7549455Z  * [new branch]              gh/aorenste/158/orig        -> origin/gh/aorenste/158/orig
2025-12-04T08:57:05.7551532Z  * [new branch]              gh/aorenste/159/base        -> origin/gh/aorenste/159/base
2025-12-04T08:57:05.7553114Z  * [new branch]              gh/aorenste/159/head        -> origin/gh/aorenste/159/head
2025-12-04T08:57:05.7554685Z  * [new branch]              gh/aorenste/159/orig        -> origin/gh/aorenste/159/orig
2025-12-04T08:57:05.7557369Z  * [new branch]              gh/avikchaudhuri/1/base     -> origin/gh/avikchaudhuri/1/base
2025-12-04T08:57:05.7558963Z  * [new branch]              gh/avikchaudhuri/1/head     -> origin/gh/avikchaudhuri/1/head
2025-12-04T08:57:05.7561080Z  * [new branch]              gh/avikchaudhuri/2/base     -> origin/gh/avikchaudhuri/2/base
2025-12-04T08:57:05.7562656Z  * [new branch]              gh/avikchaudhuri/2/head     -> origin/gh/avikchaudhuri/2/head
2025-12-04T08:57:05.7564181Z  * [new branch]              gh/avikchaudhuri/2/orig     -> origin/gh/avikchaudhuri/2/orig
2025-12-04T08:57:05.7567080Z  * [new branch]              gh/bdhirsh/666/base         -> origin/gh/bdhirsh/666/base
2025-12-04T08:57:05.7568821Z  * [new branch]              gh/bdhirsh/666/head         -> origin/gh/bdhirsh/666/head
2025-12-04T08:57:05.7570345Z  * [new branch]              gh/bdhirsh/666/orig         -> origin/gh/bdhirsh/666/orig
2025-12-04T08:57:05.7572480Z  * [new branch]              gh/bdhirsh/668/base         -> origin/gh/bdhirsh/668/base
2025-12-04T08:57:05.7574035Z  * [new branch]              gh/bdhirsh/668/head         -> origin/gh/bdhirsh/668/head
2025-12-04T08:57:05.7575604Z  * [new branch]              gh/bdhirsh/668/orig         -> origin/gh/bdhirsh/668/orig
2025-12-04T08:57:05.7577896Z  * [new branch]              gh/bdhirsh/669/base         -> origin/gh/bdhirsh/669/base
2025-12-04T08:57:05.7579429Z  * [new branch]              gh/bdhirsh/669/head         -> origin/gh/bdhirsh/669/head
2025-12-04T08:57:05.7581010Z  * [new branch]              gh/bdhirsh/669/orig         -> origin/gh/bdhirsh/669/orig
2025-12-04T08:57:05.7583266Z  * [new branch]              gh/bdhirsh/670/base         -> origin/gh/bdhirsh/670/base
2025-12-04T08:57:05.7584964Z  * [new branch]              gh/bdhirsh/670/head         -> origin/gh/bdhirsh/670/head
2025-12-04T08:57:05.7586563Z  * [new branch]              gh/bdhirsh/670/orig         -> origin/gh/bdhirsh/670/orig
2025-12-04T08:57:05.7588857Z  * [new branch]              gh/bdhirsh/672/base         -> origin/gh/bdhirsh/672/base
2025-12-04T08:57:05.7590469Z  * [new branch]              gh/bdhirsh/672/head         -> origin/gh/bdhirsh/672/head
2025-12-04T08:57:05.7592087Z  * [new branch]              gh/bdhirsh/672/orig         -> origin/gh/bdhirsh/672/orig
2025-12-04T08:57:05.7594398Z  * [new branch]              gh/bdhirsh/675/base         -> origin/gh/bdhirsh/675/base
2025-12-04T08:57:05.7596092Z  * [new branch]              gh/bdhirsh/675/head         -> origin/gh/bdhirsh/675/head
2025-12-04T08:57:05.7597720Z  * [new branch]              gh/bdhirsh/675/orig         -> origin/gh/bdhirsh/675/orig
2025-12-04T08:57:05.7600410Z  * [new branch]              gh/bdhirsh/676/base         -> origin/gh/bdhirsh/676/base
2025-12-04T08:57:05.7602199Z  * [new branch]              gh/bdhirsh/676/head         -> origin/gh/bdhirsh/676/head
2025-12-04T08:57:05.7603790Z  * [new branch]              gh/bdhirsh/676/orig         -> origin/gh/bdhirsh/676/orig
2025-12-04T08:57:05.7605942Z  * [new branch]              gh/bdhirsh/677/base         -> origin/gh/bdhirsh/677/base
2025-12-04T08:57:05.7607985Z  * [new branch]              gh/bdhirsh/677/head         -> origin/gh/bdhirsh/677/head
2025-12-04T08:57:05.7609387Z  * [new branch]              gh/bdhirsh/677/orig         -> origin/gh/bdhirsh/677/orig
2025-12-04T08:57:05.7611745Z  * [new branch]              gh/bdhirsh/678/base         -> origin/gh/bdhirsh/678/base
2025-12-04T08:57:05.7613416Z  * [new branch]              gh/bdhirsh/678/head         -> origin/gh/bdhirsh/678/head
2025-12-04T08:57:05.7615046Z  * [new branch]              gh/bdhirsh/678/orig         -> origin/gh/bdhirsh/678/orig
2025-12-04T08:57:05.7617410Z  * [new branch]              gh/bdhirsh/679/base         -> origin/gh/bdhirsh/679/base
2025-12-04T08:57:05.7619127Z  * [new branch]              gh/bdhirsh/679/head         -> origin/gh/bdhirsh/679/head
2025-12-04T08:57:05.7620691Z  * [new branch]              gh/bdhirsh/679/orig         -> origin/gh/bdhirsh/679/orig
2025-12-04T08:57:05.7622831Z  * [new branch]              gh/bdhirsh/680/base         -> origin/gh/bdhirsh/680/base
2025-12-04T08:57:05.7624529Z  * [new branch]              gh/bdhirsh/680/head         -> origin/gh/bdhirsh/680/head
2025-12-04T08:57:05.7626447Z  * [new branch]              gh/bdhirsh/680/orig         -> origin/gh/bdhirsh/680/orig
2025-12-04T08:57:05.7628474Z  * [new branch]              gh/bdhirsh/681/base         -> origin/gh/bdhirsh/681/base
2025-12-04T08:57:05.7630210Z  * [new branch]              gh/bdhirsh/681/head         -> origin/gh/bdhirsh/681/head
2025-12-04T08:57:05.7631819Z  * [new branch]              gh/bdhirsh/681/orig         -> origin/gh/bdhirsh/681/orig
2025-12-04T08:57:05.7634780Z  * [new branch]              gh/benjaminglass1/101/base  -> origin/gh/benjaminglass1/101/base
2025-12-04T08:57:05.7636181Z  * [new branch]              gh/benjaminglass1/101/head  -> origin/gh/benjaminglass1/101/head
2025-12-04T08:57:05.7637741Z  * [new branch]              gh/benjaminglass1/101/orig  -> origin/gh/benjaminglass1/101/orig
2025-12-04T08:57:05.7639834Z  * [new branch]              gh/benjaminglass1/102/base  -> origin/gh/benjaminglass1/102/base
2025-12-04T08:57:05.7641736Z  * [new branch]              gh/benjaminglass1/102/head  -> origin/gh/benjaminglass1/102/head
2025-12-04T08:57:05.7643303Z  * [new branch]              gh/benjaminglass1/102/orig  -> origin/gh/benjaminglass1/102/orig
2025-12-04T08:57:05.7645463Z  * [new branch]              gh/benjaminglass1/106/base  -> origin/gh/benjaminglass1/106/base
2025-12-04T08:57:05.7647051Z  * [new branch]              gh/benjaminglass1/106/head  -> origin/gh/benjaminglass1/106/head
2025-12-04T08:57:05.7648717Z  * [new branch]              gh/benjaminglass1/106/orig  -> origin/gh/benjaminglass1/106/orig
2025-12-04T08:57:05.7650786Z  * [new branch]              gh/benjaminglass1/107/base  -> origin/gh/benjaminglass1/107/base
2025-12-04T08:57:05.7652352Z  * [new branch]              gh/benjaminglass1/107/head  -> origin/gh/benjaminglass1/107/head
2025-12-04T08:57:05.7653989Z  * [new branch]              gh/benjaminglass1/107/orig  -> origin/gh/benjaminglass1/107/orig
2025-12-04T08:57:05.7656108Z  * [new branch]              gh/benjaminglass1/108/base  -> origin/gh/benjaminglass1/108/base
2025-12-04T08:57:05.7657719Z  * [new branch]              gh/benjaminglass1/108/head  -> origin/gh/benjaminglass1/108/head
2025-12-04T08:57:05.7659391Z  * [new branch]              gh/benjaminglass1/108/orig  -> origin/gh/benjaminglass1/108/orig
2025-12-04T08:57:05.7661495Z  * [new branch]              gh/benjaminglass1/109/base  -> origin/gh/benjaminglass1/109/base
2025-12-04T08:57:05.7663056Z  * [new branch]              gh/benjaminglass1/109/head  -> origin/gh/benjaminglass1/109/head
2025-12-04T08:57:05.7664649Z  * [new branch]              gh/benjaminglass1/109/orig  -> origin/gh/benjaminglass1/109/orig
2025-12-04T08:57:05.7666759Z  * [new branch]              gh/benjaminglass1/97/base   -> origin/gh/benjaminglass1/97/base
2025-12-04T08:57:05.7668376Z  * [new branch]              gh/benjaminglass1/97/head   -> origin/gh/benjaminglass1/97/head
2025-12-04T08:57:05.7669986Z  * [new branch]              gh/benjaminglass1/97/orig   -> origin/gh/benjaminglass1/97/orig
2025-12-04T08:57:05.7672521Z  * [new branch]              gh/bobrenjc93/570/base      -> origin/gh/bobrenjc93/570/base
2025-12-04T08:57:05.7674152Z  * [new branch]              gh/bobrenjc93/570/head      -> origin/gh/bobrenjc93/570/head
2025-12-04T08:57:05.7675789Z  * [new branch]              gh/bobrenjc93/570/orig      -> origin/gh/bobrenjc93/570/orig
2025-12-04T08:57:05.7677814Z  * [new branch]              gh/bobrenjc93/604/base      -> origin/gh/bobrenjc93/604/base
2025-12-04T08:57:05.7679425Z  * [new branch]              gh/bobrenjc93/604/head      -> origin/gh/bobrenjc93/604/head
2025-12-04T08:57:05.7681121Z  * [new branch]              gh/bobrenjc93/604/orig      -> origin/gh/bobrenjc93/604/orig
2025-12-04T08:57:05.7683781Z  * [new branch]              gh/bobrenjc93/638/base      -> origin/gh/bobrenjc93/638/base
2025-12-04T08:57:05.7685389Z  * [new branch]              gh/bobrenjc93/638/head      -> origin/gh/bobrenjc93/638/head
2025-12-04T08:57:05.7686956Z  * [new branch]              gh/bobrenjc93/638/orig      -> origin/gh/bobrenjc93/638/orig
2025-12-04T08:57:05.7689113Z  * [new branch]              gh/bobrenjc93/653/base      -> origin/gh/bobrenjc93/653/base
2025-12-04T08:57:05.7690658Z  * [new branch]              gh/bobrenjc93/653/head      -> origin/gh/bobrenjc93/653/head
2025-12-04T08:57:05.7692264Z  * [new branch]              gh/bobrenjc93/653/orig      -> origin/gh/bobrenjc93/653/orig
2025-12-04T08:57:05.7694490Z  * [new branch]              gh/bobrenjc93/654/base      -> origin/gh/bobrenjc93/654/base
2025-12-04T08:57:05.7696222Z  * [new branch]              gh/bobrenjc93/654/head      -> origin/gh/bobrenjc93/654/head
2025-12-04T08:57:05.7697828Z  * [new branch]              gh/bobrenjc93/654/orig      -> origin/gh/bobrenjc93/654/orig
2025-12-04T08:57:05.7699907Z  * [new branch]              gh/bobrenjc93/657/base      -> origin/gh/bobrenjc93/657/base
2025-12-04T08:57:05.7701478Z  * [new branch]              gh/bobrenjc93/657/head      -> origin/gh/bobrenjc93/657/head
2025-12-04T08:57:05.7703096Z  * [new branch]              gh/bobrenjc93/657/orig      -> origin/gh/bobrenjc93/657/orig
2025-12-04T08:57:05.7705191Z  * [new branch]              gh/bobrenjc93/672/base      -> origin/gh/bobrenjc93/672/base
2025-12-04T08:57:05.7706720Z  * [new branch]              gh/bobrenjc93/672/head      -> origin/gh/bobrenjc93/672/head
2025-12-04T08:57:05.7708296Z  * [new branch]              gh/bobrenjc93/672/orig      -> origin/gh/bobrenjc93/672/orig
2025-12-04T08:57:05.7710437Z  * [new branch]              gh/bobrenjc93/679/base      -> origin/gh/bobrenjc93/679/base
2025-12-04T08:57:05.7712255Z  * [new branch]              gh/bobrenjc93/679/head      -> origin/gh/bobrenjc93/679/head
2025-12-04T08:57:05.7713838Z  * [new branch]              gh/bobrenjc93/679/orig      -> origin/gh/bobrenjc93/679/orig
2025-12-04T08:57:05.7716084Z  * [new branch]              gh/bobrenjc93/680/base      -> origin/gh/bobrenjc93/680/base
2025-12-04T08:57:05.7717709Z  * [new branch]              gh/bobrenjc93/680/head      -> origin/gh/bobrenjc93/680/head
2025-12-04T08:57:05.7719517Z  * [new branch]              gh/bobrenjc93/680/orig      -> origin/gh/bobrenjc93/680/orig
2025-12-04T08:57:05.7721787Z  * [new branch]              gh/bobrenjc93/681/base      -> origin/gh/bobrenjc93/681/base
2025-12-04T08:57:05.7723379Z  * [new branch]              gh/bobrenjc93/681/head      -> origin/gh/bobrenjc93/681/head
2025-12-04T08:57:05.7724946Z  * [new branch]              gh/bobrenjc93/681/orig      -> origin/gh/bobrenjc93/681/orig
2025-12-04T08:57:05.7726952Z  * [new branch]              gh/bobrenjc93/682/base      -> origin/gh/bobrenjc93/682/base
2025-12-04T08:57:05.7728551Z  * [new branch]              gh/bobrenjc93/682/head      -> origin/gh/bobrenjc93/682/head
2025-12-04T08:57:05.7730017Z  * [new branch]              gh/bobrenjc93/682/orig      -> origin/gh/bobrenjc93/682/orig
2025-12-04T08:57:05.7732107Z  * [new branch]              gh/bobrenjc93/683/base      -> origin/gh/bobrenjc93/683/base
2025-12-04T08:57:05.7733716Z  * [new branch]              gh/bobrenjc93/683/head      -> origin/gh/bobrenjc93/683/head
2025-12-04T08:57:05.7735346Z  * [new branch]              gh/bobrenjc93/683/orig      -> origin/gh/bobrenjc93/683/orig
2025-12-04T08:57:05.7737805Z  * [new branch]              gh/bobrenjc93/684/base      -> origin/gh/bobrenjc93/684/base
2025-12-04T08:57:05.7739270Z  * [new branch]              gh/bobrenjc93/684/head      -> origin/gh/bobrenjc93/684/head
2025-12-04T08:57:05.7741019Z  * [new branch]              gh/bobrenjc93/684/orig      -> origin/gh/bobrenjc93/684/orig
2025-12-04T08:57:05.7743105Z  * [new branch]              gh/bobrenjc93/685/base      -> origin/gh/bobrenjc93/685/base
2025-12-04T08:57:05.7744929Z  * [new branch]              gh/bobrenjc93/685/head      -> origin/gh/bobrenjc93/685/head
2025-12-04T08:57:05.7746756Z  * [new branch]              gh/bobrenjc93/685/orig      -> origin/gh/bobrenjc93/685/orig
2025-12-04T08:57:05.7749160Z  * [new branch]              gh/bobrenjc93/686/base      -> origin/gh/bobrenjc93/686/base
2025-12-04T08:57:05.7750730Z  * [new branch]              gh/bobrenjc93/686/head      -> origin/gh/bobrenjc93/686/head
2025-12-04T08:57:05.7752344Z  * [new branch]              gh/bobrenjc93/686/orig      -> origin/gh/bobrenjc93/686/orig
2025-12-04T08:57:05.7754821Z  * [new branch]              gh/bobrenjc93/687/base      -> origin/gh/bobrenjc93/687/base
2025-12-04T08:57:05.7756976Z  * [new branch]              gh/bobrenjc93/687/head      -> origin/gh/bobrenjc93/687/head
2025-12-04T08:57:05.7758641Z  * [new branch]              gh/bobrenjc93/687/orig      -> origin/gh/bobrenjc93/687/orig
2025-12-04T08:57:05.7761266Z  * [new branch]              gh/bobrenjc93/688/base      -> origin/gh/bobrenjc93/688/base
2025-12-04T08:57:05.7762862Z  * [new branch]              gh/bobrenjc93/688/head      -> origin/gh/bobrenjc93/688/head
2025-12-04T08:57:05.7764471Z  * [new branch]              gh/bobrenjc93/688/orig      -> origin/gh/bobrenjc93/688/orig
2025-12-04T08:57:05.7766587Z  * [new branch]              gh/bobrenjc93/689/base      -> origin/gh/bobrenjc93/689/base
2025-12-04T08:57:05.7768302Z  * [new branch]              gh/bobrenjc93/689/head      -> origin/gh/bobrenjc93/689/head
2025-12-04T08:57:05.7769866Z  * [new branch]              gh/bobrenjc93/689/orig      -> origin/gh/bobrenjc93/689/orig
2025-12-04T08:57:05.7771944Z  * [new branch]              gh/bobrenjc93/690/base      -> origin/gh/bobrenjc93/690/base
2025-12-04T08:57:05.7773551Z  * [new branch]              gh/bobrenjc93/690/head      -> origin/gh/bobrenjc93/690/head
2025-12-04T08:57:05.7775129Z  * [new branch]              gh/bobrenjc93/690/orig      -> origin/gh/bobrenjc93/690/orig
2025-12-04T08:57:05.7777819Z  * [new branch]              gh/bobrenjc93/691/base      -> origin/gh/bobrenjc93/691/base
2025-12-04T08:57:05.7779687Z  * [new branch]              gh/bobrenjc93/691/head      -> origin/gh/bobrenjc93/691/head
2025-12-04T08:57:05.7781598Z  * [new branch]              gh/bobrenjc93/691/orig      -> origin/gh/bobrenjc93/691/orig
2025-12-04T08:57:05.7784337Z  * [new branch]              gh/bobrenjc93/692/base      -> origin/gh/bobrenjc93/692/base
2025-12-04T08:57:05.7785936Z  * [new branch]              gh/bobrenjc93/692/head      -> origin/gh/bobrenjc93/692/head
2025-12-04T08:57:05.7787581Z  * [new branch]              gh/bobrenjc93/692/orig      -> origin/gh/bobrenjc93/692/orig
2025-12-04T08:57:05.7789719Z  * [new branch]              gh/bobrenjc93/693/base      -> origin/gh/bobrenjc93/693/base
2025-12-04T08:57:05.7791486Z  * [new branch]              gh/bobrenjc93/693/head      -> origin/gh/bobrenjc93/693/head
2025-12-04T08:57:05.7793132Z  * [new branch]              gh/bobrenjc93/693/orig      -> origin/gh/bobrenjc93/693/orig
2025-12-04T08:57:05.7795355Z  * [new branch]              gh/bobrenjc93/694/base      -> origin/gh/bobrenjc93/694/base
2025-12-04T08:57:05.7796985Z  * [new branch]              gh/bobrenjc93/694/head      -> origin/gh/bobrenjc93/694/head
2025-12-04T08:57:05.7798601Z  * [new branch]              gh/bobrenjc93/694/orig      -> origin/gh/bobrenjc93/694/orig
2025-12-04T08:57:05.7800710Z  * [new branch]              gh/bobrenjc93/695/base      -> origin/gh/bobrenjc93/695/base
2025-12-04T08:57:05.7802255Z  * [new branch]              gh/bobrenjc93/695/head      -> origin/gh/bobrenjc93/695/head
2025-12-04T08:57:05.7804320Z  * [new branch]              gh/bobrenjc93/695/orig      -> origin/gh/bobrenjc93/695/orig
2025-12-04T08:57:05.7806978Z  * [new branch]              gh/c00w/23/base             -> origin/gh/c00w/23/base
2025-12-04T08:57:05.7808696Z  * [new branch]              gh/c00w/23/head             -> origin/gh/c00w/23/head
2025-12-04T08:57:05.7810929Z  * [new branch]              gh/c00w/53/base             -> origin/gh/c00w/53/base
2025-12-04T08:57:05.7812420Z  * [new branch]              gh/c00w/53/head             -> origin/gh/c00w/53/head
2025-12-04T08:57:05.7813930Z  * [new branch]              gh/c00w/53/orig             -> origin/gh/c00w/53/orig
2025-12-04T08:57:05.7815969Z  * [new branch]              gh/c00w/54/base             -> origin/gh/c00w/54/base
2025-12-04T08:57:05.7817514Z  * [new branch]              gh/c00w/54/head             -> origin/gh/c00w/54/head
2025-12-04T08:57:05.7819342Z  * [new branch]              gh/c00w/54/orig             -> origin/gh/c00w/54/orig
2025-12-04T08:57:05.7821739Z  * [new branch]              gh/c00w/56/base             -> origin/gh/c00w/56/base
2025-12-04T08:57:05.7823313Z  * [new branch]              gh/c00w/56/head             -> origin/gh/c00w/56/head
2025-12-04T08:57:05.7824899Z  * [new branch]              gh/c00w/56/orig             -> origin/gh/c00w/56/orig
2025-12-04T08:57:05.7827170Z  * [new branch]              gh/c00w/57/base             -> origin/gh/c00w/57/base
2025-12-04T08:57:05.7828452Z  * [new branch]              gh/c00w/57/head             -> origin/gh/c00w/57/head
2025-12-04T08:57:05.7830227Z  * [new branch]              gh/c00w/57/orig             -> origin/gh/c00w/57/orig
2025-12-04T08:57:05.7832308Z  * [new branch]              gh/c00w/58/base             -> origin/gh/c00w/58/base
2025-12-04T08:57:05.7833887Z  * [new branch]              gh/c00w/58/head             -> origin/gh/c00w/58/head
2025-12-04T08:57:05.7835473Z  * [new branch]              gh/c00w/58/orig             -> origin/gh/c00w/58/orig
2025-12-04T08:57:05.7838168Z  * [new branch]              gh/clee2000/1/base          -> origin/gh/clee2000/1/base
2025-12-04T08:57:05.7839805Z  * [new branch]              gh/clee2000/1/head          -> origin/gh/clee2000/1/head
2025-12-04T08:57:05.7841679Z  * [new branch]              gh/clee2000/1/orig          -> origin/gh/clee2000/1/orig
2025-12-04T08:57:05.7844311Z  * [new branch]              gh/coconutruben/1/base      -> origin/gh/coconutruben/1/base
2025-12-04T08:57:05.7845947Z  * [new branch]              gh/coconutruben/1/head      -> origin/gh/coconutruben/1/head
2025-12-04T08:57:05.7848352Z  * [new branch]              gh/coconutruben/55/base     -> origin/gh/coconutruben/55/base
2025-12-04T08:57:05.7849961Z  * [new branch]              gh/coconutruben/55/head     -> origin/gh/coconutruben/55/head
2025-12-04T08:57:05.7851592Z  * [new branch]              gh/coconutruben/55/orig     -> origin/gh/coconutruben/55/orig
2025-12-04T08:57:05.7853806Z  * [new branch]              gh/coconutruben/57/base     -> origin/gh/coconutruben/57/base
2025-12-04T08:57:05.7855505Z  * [new branch]              gh/coconutruben/57/head     -> origin/gh/coconutruben/57/head
2025-12-04T08:57:05.7857153Z  * [new branch]              gh/coconutruben/57/orig     -> origin/gh/coconutruben/57/orig
2025-12-04T08:57:05.7859302Z  * [new branch]              gh/coconutruben/70/base     -> origin/gh/coconutruben/70/base
2025-12-04T08:57:05.7860967Z  * [new branch]              gh/coconutruben/70/head     -> origin/gh/coconutruben/70/head
2025-12-04T08:57:05.7862618Z  * [new branch]              gh/coconutruben/70/orig     -> origin/gh/coconutruben/70/orig
2025-12-04T08:57:05.7864603Z  * [new branch]              gh/coconutruben/71/base     -> origin/gh/coconutruben/71/base
2025-12-04T08:57:05.7866149Z  * [new branch]              gh/coconutruben/71/head     -> origin/gh/coconutruben/71/head
2025-12-04T08:57:05.7867862Z  * [new branch]              gh/coconutruben/71/orig     -> origin/gh/coconutruben/71/orig
2025-12-04T08:57:05.7869900Z  * [new branch]              gh/coconutruben/72/base     -> origin/gh/coconutruben/72/base
2025-12-04T08:57:05.7871590Z  * [new branch]              gh/coconutruben/72/head     -> origin/gh/coconutruben/72/head
2025-12-04T08:57:05.7873429Z  * [new branch]              gh/coconutruben/72/orig     -> origin/gh/coconutruben/72/orig
2025-12-04T08:57:05.7875255Z  * [new branch]              gh/coconutruben/73/base     -> origin/gh/coconutruben/73/base
2025-12-04T08:57:05.7876898Z  * [new branch]              gh/coconutruben/73/head     -> origin/gh/coconutruben/73/head
2025-12-04T08:57:05.7878465Z  * [new branch]              gh/coconutruben/73/orig     -> origin/gh/coconutruben/73/orig
2025-12-04T08:57:05.7880869Z  * [new branch]              gh/coconutruben/74/base     -> origin/gh/coconutruben/74/base
2025-12-04T08:57:05.7882552Z  * [new branch]              gh/coconutruben/74/head     -> origin/gh/coconutruben/74/head
2025-12-04T08:57:05.7884181Z  * [new branch]              gh/coconutruben/74/orig     -> origin/gh/coconutruben/74/orig
2025-12-04T08:57:05.7886439Z  * [new branch]              gh/coconutruben/79/base     -> origin/gh/coconutruben/79/base
2025-12-04T08:57:05.7888047Z  * [new branch]              gh/coconutruben/79/head     -> origin/gh/coconutruben/79/head
2025-12-04T08:57:05.7889798Z  * [new branch]              gh/coconutruben/79/orig     -> origin/gh/coconutruben/79/orig
2025-12-04T08:57:05.7892122Z  * [new branch]              gh/coconutruben/80/base     -> origin/gh/coconutruben/80/base
2025-12-04T08:57:05.7893639Z  * [new branch]              gh/coconutruben/80/head     -> origin/gh/coconutruben/80/head
2025-12-04T08:57:05.7895220Z  * [new branch]              gh/coconutruben/80/orig     -> origin/gh/coconutruben/80/orig
2025-12-04T08:57:05.7897483Z  * [new branch]              gh/coconutruben/82/base     -> origin/gh/coconutruben/82/base
2025-12-04T08:57:05.7899089Z  * [new branch]              gh/coconutruben/82/head     -> origin/gh/coconutruben/82/head
2025-12-04T08:57:05.7900636Z  * [new branch]              gh/coconutruben/82/orig     -> origin/gh/coconutruben/82/orig
2025-12-04T08:57:05.7902987Z  * [new branch]              gh/coconutruben/83/base     -> origin/gh/coconutruben/83/base
2025-12-04T08:57:05.7904519Z  * [new branch]              gh/coconutruben/83/head     -> origin/gh/coconutruben/83/head
2025-12-04T08:57:05.7906122Z  * [new branch]              gh/coconutruben/83/orig     -> origin/gh/coconutruben/83/orig
2025-12-04T08:57:05.7908359Z  * [new branch]              gh/coconutruben/84/base     -> origin/gh/coconutruben/84/base
2025-12-04T08:57:05.7910034Z  * [new branch]              gh/coconutruben/84/head     -> origin/gh/coconutruben/84/head
2025-12-04T08:57:05.7911678Z  * [new branch]              gh/coconutruben/84/orig     -> origin/gh/coconutruben/84/orig
2025-12-04T08:57:05.7913780Z  * [new branch]              gh/coconutruben/85/base     -> origin/gh/coconutruben/85/base
2025-12-04T08:57:05.7915498Z  * [new branch]              gh/coconutruben/85/head     -> origin/gh/coconutruben/85/head
2025-12-04T08:57:05.7917272Z  * [new branch]              gh/coconutruben/85/orig     -> origin/gh/coconutruben/85/orig
2025-12-04T08:57:05.7921909Z  * [new branch]              gh/coconutruben/86/base     -> origin/gh/coconutruben/86/base
2025-12-04T08:57:05.7923468Z  * [new branch]              gh/coconutruben/86/head     -> origin/gh/coconutruben/86/head
2025-12-04T08:57:05.7925074Z  * [new branch]              gh/coconutruben/86/orig     -> origin/gh/coconutruben/86/orig
2025-12-04T08:57:05.7927659Z  * [new branch]              gh/colinchan15/1/base       -> origin/gh/colinchan15/1/base
2025-12-04T08:57:05.7929265Z  * [new branch]              gh/colinchan15/1/head       -> origin/gh/colinchan15/1/head
2025-12-04T08:57:05.7931321Z  * [new branch]              gh/colinchan15/2/base       -> origin/gh/colinchan15/2/base
2025-12-04T08:57:05.7932895Z  * [new branch]              gh/colinchan15/2/head       -> origin/gh/colinchan15/2/head
2025-12-04T08:57:05.7934931Z  * [new branch]              gh/colinchan15/3/base       -> origin/gh/colinchan15/3/base
2025-12-04T08:57:05.7936475Z  * [new branch]              gh/colinchan15/3/head       -> origin/gh/colinchan15/3/head
2025-12-04T08:57:05.7938513Z  * [new branch]              gh/colinchan15/6/base       -> origin/gh/colinchan15/6/base
2025-12-04T08:57:05.7940023Z  * [new branch]              gh/colinchan15/6/head       -> origin/gh/colinchan15/6/head
2025-12-04T08:57:05.7942584Z  * [new branch]              gh/d4l3k/1/base             -> origin/gh/d4l3k/1/base
2025-12-04T08:57:05.7944206Z  * [new branch]              gh/d4l3k/1/head             -> origin/gh/d4l3k/1/head
2025-12-04T08:57:05.7946274Z  * [new branch]              gh/d4l3k/2/base             -> origin/gh/d4l3k/2/base
2025-12-04T08:57:05.7947913Z  * [new branch]              gh/d4l3k/2/head             -> origin/gh/d4l3k/2/head
2025-12-04T08:57:05.7949458Z  * [new branch]              gh/d4l3k/2/orig             -> origin/gh/d4l3k/2/orig
2025-12-04T08:57:05.7951639Z  * [new branch]              gh/d4l3k/3/base             -> origin/gh/d4l3k/3/base
2025-12-04T08:57:05.7953229Z  * [new branch]              gh/d4l3k/3/head             -> origin/gh/d4l3k/3/head
2025-12-04T08:57:05.7954799Z  * [new branch]              gh/d4l3k/3/orig             -> origin/gh/d4l3k/3/orig
2025-12-04T08:57:05.7957156Z  * [new branch]              gh/d4l3k/4/base             -> origin/gh/d4l3k/4/base
2025-12-04T08:57:05.7958763Z  * [new branch]              gh/d4l3k/4/head             -> origin/gh/d4l3k/4/head
2025-12-04T08:57:05.7960568Z  * [new branch]              gh/d4l3k/4/orig             -> origin/gh/d4l3k/4/orig
2025-12-04T08:57:05.7962595Z  * [new branch]              gh/d4l3k/5/base             -> origin/gh/d4l3k/5/base
2025-12-04T08:57:05.7964109Z  * [new branch]              gh/d4l3k/5/orig             -> origin/gh/d4l3k/5/orig
2025-12-04T08:57:05.7966822Z  * [new branch]              gh/davidberard98/392/base   -> origin/gh/davidberard98/392/base
2025-12-04T08:57:05.7968412Z  * [new branch]              gh/davidberard98/392/head   -> origin/gh/davidberard98/392/head
2025-12-04T08:57:05.7969874Z  * [new branch]              gh/davidberard98/392/orig   -> origin/gh/davidberard98/392/orig
2025-12-04T08:57:05.7972044Z  * [new branch]              gh/davidberard98/399/base   -> origin/gh/davidberard98/399/base
2025-12-04T08:57:05.7973733Z  * [new branch]              gh/davidberard98/399/head   -> origin/gh/davidberard98/399/head
2025-12-04T08:57:05.7975341Z  * [new branch]              gh/davidberard98/399/orig   -> origin/gh/davidberard98/399/orig
2025-12-04T08:57:05.7977973Z  * [new branch]              gh/desertfire/605/base      -> origin/gh/desertfire/605/base
2025-12-04T08:57:05.7979601Z  * [new branch]              gh/desertfire/605/head      -> origin/gh/desertfire/605/head
2025-12-04T08:57:05.7981487Z  * [new branch]              gh/desertfire/605/orig      -> origin/gh/desertfire/605/orig
2025-12-04T08:57:05.7983620Z  * [new branch]              gh/desertfire/606/base      -> origin/gh/desertfire/606/base
2025-12-04T08:57:05.7985213Z  * [new branch]              gh/desertfire/606/head      -> origin/gh/desertfire/606/head
2025-12-04T08:57:05.7986862Z  * [new branch]              gh/desertfire/606/orig      -> origin/gh/desertfire/606/orig
2025-12-04T08:57:05.7988945Z  * [new branch]              gh/desertfire/607/base      -> origin/gh/desertfire/607/base
2025-12-04T08:57:05.7990422Z  * [new branch]              gh/desertfire/607/head      -> origin/gh/desertfire/607/head
2025-12-04T08:57:05.7992035Z  * [new branch]              gh/desertfire/607/orig      -> origin/gh/desertfire/607/orig
2025-12-04T08:57:05.7994177Z  * [new branch]              gh/desertfire/608/base      -> origin/gh/desertfire/608/base
2025-12-04T08:57:05.7995752Z  * [new branch]              gh/desertfire/608/head      -> origin/gh/desertfire/608/head
2025-12-04T08:57:05.7997362Z  * [new branch]              gh/desertfire/608/orig      -> origin/gh/desertfire/608/orig
2025-12-04T08:57:05.7999556Z  * [new branch]              gh/desertfire/609/base      -> origin/gh/desertfire/609/base
2025-12-04T08:57:05.8001284Z  * [new branch]              gh/desertfire/609/head      -> origin/gh/desertfire/609/head
2025-12-04T08:57:05.8002860Z  * [new branch]              gh/desertfire/609/orig      -> origin/gh/desertfire/609/orig
2025-12-04T08:57:05.8005168Z  * [new branch]              gh/desertfire/610/base      -> origin/gh/desertfire/610/base
2025-12-04T08:57:05.8006769Z  * [new branch]              gh/desertfire/610/head      -> origin/gh/desertfire/610/head
2025-12-04T08:57:05.8008491Z  * [new branch]              gh/desertfire/610/orig      -> origin/gh/desertfire/610/orig
2025-12-04T08:57:05.8010452Z  * [new branch]              gh/desertfire/611/base      -> origin/gh/desertfire/611/base
2025-12-04T08:57:05.8012077Z  * [new branch]              gh/desertfire/611/head      -> origin/gh/desertfire/611/head
2025-12-04T08:57:05.8013716Z  * [new branch]              gh/desertfire/611/orig      -> origin/gh/desertfire/611/orig
2025-12-04T08:57:05.8015903Z  * [new branch]              gh/desertfire/612/base      -> origin/gh/desertfire/612/base
2025-12-04T08:57:05.8017731Z  * [new branch]              gh/desertfire/612/head      -> origin/gh/desertfire/612/head
2025-12-04T08:57:05.8019442Z  * [new branch]              gh/desertfire/612/orig      -> origin/gh/desertfire/612/orig
2025-12-04T08:57:05.8022015Z  * [new branch]              gh/desertfire/613/base      -> origin/gh/desertfire/613/base
2025-12-04T08:57:05.8023768Z  * [new branch]              gh/desertfire/613/head      -> origin/gh/desertfire/613/head
2025-12-04T08:57:05.8025355Z  * [new branch]              gh/desertfire/613/orig      -> origin/gh/desertfire/613/orig
2025-12-04T08:57:05.8027567Z  * [new branch]              gh/desertfire/614/base      -> origin/gh/desertfire/614/base
2025-12-04T08:57:05.8029242Z  * [new branch]              gh/desertfire/614/head      -> origin/gh/desertfire/614/head
2025-12-04T08:57:05.8030851Z  * [new branch]              gh/desertfire/614/orig      -> origin/gh/desertfire/614/orig
2025-12-04T08:57:05.8033001Z  * [new branch]              gh/desertfire/615/base      -> origin/gh/desertfire/615/base
2025-12-04T08:57:05.8034780Z  * [new branch]              gh/desertfire/615/head      -> origin/gh/desertfire/615/head
2025-12-04T08:57:05.8036399Z  * [new branch]              gh/desertfire/615/orig      -> origin/gh/desertfire/615/orig
2025-12-04T08:57:05.8038369Z  * [new branch]              gh/desertfire/616/base      -> origin/gh/desertfire/616/base
2025-12-04T08:57:05.8039976Z  * [new branch]              gh/desertfire/616/head      -> origin/gh/desertfire/616/head
2025-12-04T08:57:05.8041586Z  * [new branch]              gh/desertfire/616/orig      -> origin/gh/desertfire/616/orig
2025-12-04T08:57:05.8043648Z  * [new branch]              gh/desertfire/617/base      -> origin/gh/desertfire/617/base
2025-12-04T08:57:05.8045347Z  * [new branch]              gh/desertfire/617/head      -> origin/gh/desertfire/617/head
2025-12-04T08:57:05.8046851Z  * [new branch]              gh/desertfire/617/orig      -> origin/gh/desertfire/617/orig
2025-12-04T08:57:05.8049450Z  * [new branch]              gh/dharakk/1/base           -> origin/gh/dharakk/1/base
2025-12-04T08:57:05.8051086Z  * [new branch]              gh/dharakk/1/head           -> origin/gh/dharakk/1/head
2025-12-04T08:57:05.8053641Z  * [new branch]              gh/drisspg/170/base         -> origin/gh/drisspg/170/base
2025-12-04T08:57:05.8055238Z  * [new branch]              gh/drisspg/170/head         -> origin/gh/drisspg/170/head
2025-12-04T08:57:05.8056842Z  * [new branch]              gh/drisspg/170/orig         -> origin/gh/drisspg/170/orig
2025-12-04T08:57:05.8058977Z  * [new branch]              gh/drisspg/182/base         -> origin/gh/drisspg/182/base
2025-12-04T08:57:05.8060632Z  * [new branch]              gh/drisspg/182/head         -> origin/gh/drisspg/182/head
2025-12-04T08:57:05.8062688Z  * [new branch]              gh/drisspg/183/base         -> origin/gh/drisspg/183/base
2025-12-04T08:57:05.8064195Z  * [new branch]              gh/drisspg/183/head         -> origin/gh/drisspg/183/head
2025-12-04T08:57:05.8066232Z  * [new branch]              gh/drisspg/184/base         -> origin/gh/drisspg/184/base
2025-12-04T08:57:05.8067745Z  * [new branch]              gh/drisspg/184/head         -> origin/gh/drisspg/184/head
2025-12-04T08:57:05.8069872Z  * [new branch]              gh/drisspg/185/base         -> origin/gh/drisspg/185/base
2025-12-04T08:57:05.8071527Z  * [new branch]              gh/drisspg/185/head         -> origin/gh/drisspg/185/head
2025-12-04T08:57:05.8073593Z  * [new branch]              gh/drisspg/194/base         -> origin/gh/drisspg/194/base
2025-12-04T08:57:05.8075158Z  * [new branch]              gh/drisspg/194/head         -> origin/gh/drisspg/194/head
2025-12-04T08:57:05.8076818Z  * [new branch]              gh/drisspg/194/orig         -> origin/gh/drisspg/194/orig
2025-12-04T08:57:05.8078897Z  * [new branch]              gh/drisspg/200/base         -> origin/gh/drisspg/200/base
2025-12-04T08:57:05.8080561Z  * [new branch]              gh/drisspg/200/head         -> origin/gh/drisspg/200/head
2025-12-04T08:57:05.8082321Z  * [new branch]              gh/drisspg/200/orig         -> origin/gh/drisspg/200/orig
2025-12-04T08:57:05.8084341Z  * [new branch]              gh/drisspg/218/base         -> origin/gh/drisspg/218/base
2025-12-04T08:57:05.8086035Z  * [new branch]              gh/drisspg/218/head         -> origin/gh/drisspg/218/head
2025-12-04T08:57:05.8087549Z  * [new branch]              gh/drisspg/218/orig         -> origin/gh/drisspg/218/orig
2025-12-04T08:57:05.8089627Z  * [new branch]              gh/drisspg/219/base         -> origin/gh/drisspg/219/base
2025-12-04T08:57:05.8091233Z  * [new branch]              gh/drisspg/219/head         -> origin/gh/drisspg/219/head
2025-12-04T08:57:05.8092827Z  * [new branch]              gh/drisspg/219/orig         -> origin/gh/drisspg/219/orig
2025-12-04T08:57:05.8094998Z  * [new branch]              gh/drisspg/220/base         -> origin/gh/drisspg/220/base
2025-12-04T08:57:05.8096624Z  * [new branch]              gh/drisspg/220/head         -> origin/gh/drisspg/220/head
2025-12-04T08:57:05.8098228Z  * [new branch]              gh/drisspg/220/orig         -> origin/gh/drisspg/220/orig
2025-12-04T08:57:05.8100273Z  * [new branch]              gh/drisspg/221/base         -> origin/gh/drisspg/221/base
2025-12-04T08:57:05.8101868Z  * [new branch]              gh/drisspg/221/head         -> origin/gh/drisspg/221/head
2025-12-04T08:57:05.8103964Z  * [new branch]              gh/drisspg/221/orig         -> origin/gh/drisspg/221/orig
2025-12-04T08:57:05.8106066Z  * [new branch]              gh/drisspg/222/base         -> origin/gh/drisspg/222/base
2025-12-04T08:57:05.8107661Z  * [new branch]              gh/drisspg/222/head         -> origin/gh/drisspg/222/head
2025-12-04T08:57:05.8109253Z  * [new branch]              gh/drisspg/222/orig         -> origin/gh/drisspg/222/orig
2025-12-04T08:57:05.8111435Z  * [new branch]              gh/drisspg/223/base         -> origin/gh/drisspg/223/base
2025-12-04T08:57:05.8113026Z  * [new branch]              gh/drisspg/223/head         -> origin/gh/drisspg/223/head
2025-12-04T08:57:05.8114578Z  * [new branch]              gh/drisspg/223/orig         -> origin/gh/drisspg/223/orig
2025-12-04T08:57:05.8116724Z  * [new branch]              gh/drisspg/224/base         -> origin/gh/drisspg/224/base
2025-12-04T08:57:05.8118536Z  * [new branch]              gh/drisspg/224/head         -> origin/gh/drisspg/224/head
2025-12-04T08:57:05.8132758Z  * [new branch]              gh/drisspg/224/orig         -> origin/gh/drisspg/224/orig
2025-12-04T08:57:05.8133061Z  * [new branch]              gh/drisspg/225/base         -> origin/gh/drisspg/225/base
2025-12-04T08:57:05.8133258Z  * [new branch]              gh/drisspg/225/head         -> origin/gh/drisspg/225/head
2025-12-04T08:57:05.8133416Z  * [new branch]              gh/drisspg/225/orig         -> origin/gh/drisspg/225/orig
2025-12-04T08:57:05.8133570Z  * [new branch]              gh/drisspg/226/base         -> origin/gh/drisspg/226/base
2025-12-04T08:57:05.8133738Z  * [new branch]              gh/drisspg/226/head         -> origin/gh/drisspg/226/head
2025-12-04T08:57:05.8133887Z  * [new branch]              gh/drisspg/226/orig         -> origin/gh/drisspg/226/orig
2025-12-04T08:57:05.8134047Z  * [new branch]              gh/drisspg/227/base         -> origin/gh/drisspg/227/base
2025-12-04T08:57:05.8135236Z  * [new branch]              gh/drisspg/227/head         -> origin/gh/drisspg/227/head
2025-12-04T08:57:05.8136787Z  * [new branch]              gh/drisspg/227/orig         -> origin/gh/drisspg/227/orig
2025-12-04T08:57:05.8139022Z  * [new branch]              gh/drisspg/228/base         -> origin/gh/drisspg/228/base
2025-12-04T08:57:05.8140812Z  * [new branch]              gh/drisspg/228/head         -> origin/gh/drisspg/228/head
2025-12-04T08:57:05.8142203Z  * [new branch]              gh/drisspg/228/orig         -> origin/gh/drisspg/228/orig
2025-12-04T08:57:05.8144352Z  * [new branch]              gh/drisspg/229/base         -> origin/gh/drisspg/229/base
2025-12-04T08:57:05.8145950Z  * [new branch]              gh/drisspg/229/head         -> origin/gh/drisspg/229/head
2025-12-04T08:57:05.8147612Z  * [new branch]              gh/drisspg/229/orig         -> origin/gh/drisspg/229/orig
2025-12-04T08:57:05.8150088Z  * [new branch]              gh/drisspg/230/base         -> origin/gh/drisspg/230/base
2025-12-04T08:57:05.8151433Z  * [new branch]              gh/drisspg/230/head         -> origin/gh/drisspg/230/head
2025-12-04T08:57:05.8153050Z  * [new branch]              gh/drisspg/230/orig         -> origin/gh/drisspg/230/orig
2025-12-04T08:57:05.8155632Z  * [new branch]              gh/dsjohns2/1/base          -> origin/gh/dsjohns2/1/base
2025-12-04T08:57:05.8157275Z  * [new branch]              gh/dsjohns2/1/head          -> origin/gh/dsjohns2/1/head
2025-12-04T08:57:05.8159824Z  * [new branch]              gh/dzmitry-huba/1/base      -> origin/gh/dzmitry-huba/1/base
2025-12-04T08:57:05.8161623Z  * [new branch]              gh/dzmitry-huba/1/head      -> origin/gh/dzmitry-huba/1/head
2025-12-04T08:57:05.8163838Z  * [new branch]              gh/dzmitry-huba/12/base     -> origin/gh/dzmitry-huba/12/base
2025-12-04T08:57:05.8165621Z  * [new branch]              gh/dzmitry-huba/12/head     -> origin/gh/dzmitry-huba/12/head
2025-12-04T08:57:05.8167339Z  * [new branch]              gh/dzmitry-huba/12/orig     -> origin/gh/dzmitry-huba/12/orig
2025-12-04T08:57:05.8169622Z  * [new branch]              gh/dzmitry-huba/13/base     -> origin/gh/dzmitry-huba/13/base
2025-12-04T08:57:05.8171271Z  * [new branch]              gh/dzmitry-huba/13/head     -> origin/gh/dzmitry-huba/13/head
2025-12-04T08:57:05.8172857Z  * [new branch]              gh/dzmitry-huba/13/orig     -> origin/gh/dzmitry-huba/13/orig
2025-12-04T08:57:05.8174938Z  * [new branch]              gh/dzmitry-huba/14/base     -> origin/gh/dzmitry-huba/14/base
2025-12-04T08:57:05.8176570Z  * [new branch]              gh/dzmitry-huba/14/head     -> origin/gh/dzmitry-huba/14/head
2025-12-04T08:57:05.8178157Z  * [new branch]              gh/dzmitry-huba/14/orig     -> origin/gh/dzmitry-huba/14/orig
2025-12-04T08:57:05.8180356Z  * [new branch]              gh/dzmitry-huba/15/base     -> origin/gh/dzmitry-huba/15/base
2025-12-04T08:57:05.8181951Z  * [new branch]              gh/dzmitry-huba/15/head     -> origin/gh/dzmitry-huba/15/head
2025-12-04T08:57:05.8183469Z  * [new branch]              gh/dzmitry-huba/15/orig     -> origin/gh/dzmitry-huba/15/orig
2025-12-04T08:57:05.8185725Z  * [new branch]              gh/dzmitry-huba/16/base     -> origin/gh/dzmitry-huba/16/base
2025-12-04T08:57:05.8187434Z  * [new branch]              gh/dzmitry-huba/16/head     -> origin/gh/dzmitry-huba/16/head
2025-12-04T08:57:05.8189158Z  * [new branch]              gh/dzmitry-huba/16/orig     -> origin/gh/dzmitry-huba/16/orig
2025-12-04T08:57:05.8191472Z  * [new branch]              gh/dzmitry-huba/17/base     -> origin/gh/dzmitry-huba/17/base
2025-12-04T08:57:05.8193074Z  * [new branch]              gh/dzmitry-huba/17/head     -> origin/gh/dzmitry-huba/17/head
2025-12-04T08:57:05.8194660Z  * [new branch]              gh/dzmitry-huba/17/orig     -> origin/gh/dzmitry-huba/17/orig
2025-12-04T08:57:05.8196607Z  * [new branch]              gh/dzmitry-huba/2/base      -> origin/gh/dzmitry-huba/2/base
2025-12-04T08:57:05.8198160Z  * [new branch]              gh/dzmitry-huba/2/head      -> origin/gh/dzmitry-huba/2/head
2025-12-04T08:57:05.8200125Z  * [new branch]              gh/dzmitry-huba/3/base      -> origin/gh/dzmitry-huba/3/base
2025-12-04T08:57:05.8201696Z  * [new branch]              gh/dzmitry-huba/3/head      -> origin/gh/dzmitry-huba/3/head
2025-12-04T08:57:05.8204388Z  * [new branch]              gh/eellison/808/base        -> origin/gh/eellison/808/base
2025-12-04T08:57:05.8205996Z  * [new branch]              gh/eellison/808/head        -> origin/gh/eellison/808/head
2025-12-04T08:57:05.8207607Z  * [new branch]              gh/eellison/808/orig        -> origin/gh/eellison/808/orig
2025-12-04T08:57:05.8210019Z  * [new branch]              gh/eellison/822/base        -> origin/gh/eellison/822/base
2025-12-04T08:57:05.8211609Z  * [new branch]              gh/eellison/822/head        -> origin/gh/eellison/822/head
2025-12-04T08:57:05.8213426Z  * [new branch]              gh/eellison/822/orig        -> origin/gh/eellison/822/orig
2025-12-04T08:57:05.8215482Z  * [new branch]              gh/eellison/823/base        -> origin/gh/eellison/823/base
2025-12-04T08:57:05.8217253Z  * [new branch]              gh/eellison/823/head        -> origin/gh/eellison/823/head
2025-12-04T08:57:05.8219044Z  * [new branch]              gh/eellison/823/orig        -> origin/gh/eellison/823/orig
2025-12-04T08:57:05.8221116Z  * [new branch]              gh/eellison/862/base        -> origin/gh/eellison/862/base
2025-12-04T08:57:05.8222731Z  * [new branch]              gh/eellison/862/head        -> origin/gh/eellison/862/head
2025-12-04T08:57:05.8224266Z  * [new branch]              gh/eellison/862/orig        -> origin/gh/eellison/862/orig
2025-12-04T08:57:05.8226358Z  * [new branch]              gh/eellison/863/base        -> origin/gh/eellison/863/base
2025-12-04T08:57:05.8227925Z  * [new branch]              gh/eellison/863/head        -> origin/gh/eellison/863/head
2025-12-04T08:57:05.8229473Z  * [new branch]              gh/eellison/863/orig        -> origin/gh/eellison/863/orig
2025-12-04T08:57:05.8231618Z  * [new branch]              gh/eellison/864/base        -> origin/gh/eellison/864/base
2025-12-04T08:57:05.8233251Z  * [new branch]              gh/eellison/864/head        -> origin/gh/eellison/864/head
2025-12-04T08:57:05.8234879Z  * [new branch]              gh/eellison/864/orig        -> origin/gh/eellison/864/orig
2025-12-04T08:57:05.8237108Z  * [new branch]              gh/eellison/865/base        -> origin/gh/eellison/865/base
2025-12-04T08:57:05.8238880Z  * [new branch]              gh/eellison/865/head        -> origin/gh/eellison/865/head
2025-12-04T08:57:05.8240560Z  * [new branch]              gh/eellison/865/orig        -> origin/gh/eellison/865/orig
2025-12-04T08:57:05.8242690Z  * [new branch]              gh/eellison/866/base        -> origin/gh/eellison/866/base
2025-12-04T08:57:05.8244330Z  * [new branch]              gh/eellison/866/head        -> origin/gh/eellison/866/head
2025-12-04T08:57:05.8245883Z  * [new branch]              gh/eellison/866/orig        -> origin/gh/eellison/866/orig
2025-12-04T08:57:05.8248119Z  * [new branch]              gh/eellison/867/base        -> origin/gh/eellison/867/base
2025-12-04T08:57:05.8249548Z  * [new branch]              gh/eellison/867/head        -> origin/gh/eellison/867/head
2025-12-04T08:57:05.8251217Z  * [new branch]              gh/eellison/867/orig        -> origin/gh/eellison/867/orig
2025-12-04T08:57:05.8253555Z  * [new branch]              gh/eellison/868/base        -> origin/gh/eellison/868/base
2025-12-04T08:57:05.8255372Z  * [new branch]              gh/eellison/868/head        -> origin/gh/eellison/868/head
2025-12-04T08:57:05.8256971Z  * [new branch]              gh/eellison/868/orig        -> origin/gh/eellison/868/orig
2025-12-04T08:57:05.8259099Z  * [new branch]              gh/eellison/869/base        -> origin/gh/eellison/869/base
2025-12-04T08:57:05.8260578Z  * [new branch]              gh/eellison/869/head        -> origin/gh/eellison/869/head
2025-12-04T08:57:05.8262133Z  * [new branch]              gh/eellison/869/orig        -> origin/gh/eellison/869/orig
2025-12-04T08:57:05.8264268Z  * [new branch]              gh/eellison/870/base        -> origin/gh/eellison/870/base
2025-12-04T08:57:05.8265791Z  * [new branch]              gh/eellison/870/head        -> origin/gh/eellison/870/head
2025-12-04T08:57:05.8267810Z  * [new branch]              gh/eellison/870/orig        -> origin/gh/eellison/870/orig
2025-12-04T08:57:05.8270069Z  * [new branch]              gh/eellison/871/base        -> origin/gh/eellison/871/base
2025-12-04T08:57:05.8271684Z  * [new branch]              gh/eellison/871/head        -> origin/gh/eellison/871/head
2025-12-04T08:57:05.8273330Z  * [new branch]              gh/eellison/871/orig        -> origin/gh/eellison/871/orig
2025-12-04T08:57:05.8275532Z  * [new branch]              gh/eellison/872/base        -> origin/gh/eellison/872/base
2025-12-04T08:57:05.8277327Z  * [new branch]              gh/eellison/872/head        -> origin/gh/eellison/872/head
2025-12-04T08:57:05.8278824Z  * [new branch]              gh/eellison/872/orig        -> origin/gh/eellison/872/orig
2025-12-04T08:57:05.8281393Z  * [new branch]              gh/eellison/873/base        -> origin/gh/eellison/873/base
2025-12-04T08:57:05.8282693Z  * [new branch]              gh/eellison/873/head        -> origin/gh/eellison/873/head
2025-12-04T08:57:05.8284280Z  * [new branch]              gh/eellison/873/orig        -> origin/gh/eellison/873/orig
2025-12-04T08:57:05.8286419Z  * [new branch]              gh/eellison/874/base        -> origin/gh/eellison/874/base
2025-12-04T08:57:05.8287994Z  * [new branch]              gh/eellison/874/head        -> origin/gh/eellison/874/head
2025-12-04T08:57:05.8289641Z  * [new branch]              gh/eellison/874/orig        -> origin/gh/eellison/874/orig
2025-12-04T08:57:05.8292238Z  * [new branch]              gh/eellison/875/base        -> origin/gh/eellison/875/base
2025-12-04T08:57:05.8294021Z  * [new branch]              gh/eellison/875/head        -> origin/gh/eellison/875/head
2025-12-04T08:57:05.8295599Z  * [new branch]              gh/eellison/875/orig        -> origin/gh/eellison/875/orig
2025-12-04T08:57:05.8297916Z  * [new branch]              gh/eellison/876/base        -> origin/gh/eellison/876/base
2025-12-04T08:57:05.8299496Z  * [new branch]              gh/eellison/876/head        -> origin/gh/eellison/876/head
2025-12-04T08:57:05.8301166Z  * [new branch]              gh/eellison/876/orig        -> origin/gh/eellison/876/orig
2025-12-04T08:57:05.8303394Z  * [new branch]              gh/eellison/877/base        -> origin/gh/eellison/877/base
2025-12-04T08:57:05.8304960Z  * [new branch]              gh/eellison/877/head        -> origin/gh/eellison/877/head
2025-12-04T08:57:05.8306529Z  * [new branch]              gh/eellison/877/orig        -> origin/gh/eellison/877/orig
2025-12-04T08:57:05.8308793Z  * [new branch]              gh/eellison/878/base        -> origin/gh/eellison/878/base
2025-12-04T08:57:05.8310364Z  * [new branch]              gh/eellison/878/head        -> origin/gh/eellison/878/head
2025-12-04T08:57:05.8311920Z  * [new branch]              gh/eellison/878/orig        -> origin/gh/eellison/878/orig
2025-12-04T08:57:05.8314612Z  * [new branch]              gh/eellison/879/base        -> origin/gh/eellison/879/base
2025-12-04T08:57:05.8316242Z  * [new branch]              gh/eellison/879/head        -> origin/gh/eellison/879/head
2025-12-04T08:57:05.8319938Z  * [new branch]              gh/eellison/879/orig        -> origin/gh/eellison/879/orig
2025-12-04T08:57:05.8322213Z  * [new branch]              gh/eellison/880/base        -> origin/gh/eellison/880/base
2025-12-04T08:57:05.8323844Z  * [new branch]              gh/eellison/880/head        -> origin/gh/eellison/880/head
2025-12-04T08:57:05.8325432Z  * [new branch]              gh/eellison/880/orig        -> origin/gh/eellison/880/orig
2025-12-04T08:57:05.8327672Z  * [new branch]              gh/eellison/881/base        -> origin/gh/eellison/881/base
2025-12-04T08:57:05.8329331Z  * [new branch]              gh/eellison/881/head        -> origin/gh/eellison/881/head
2025-12-04T08:57:05.8330968Z  * [new branch]              gh/eellison/881/orig        -> origin/gh/eellison/881/orig
2025-12-04T08:57:05.8333127Z  * [new branch]              gh/eellison/882/base        -> origin/gh/eellison/882/base
2025-12-04T08:57:05.8334718Z  * [new branch]              gh/eellison/882/head        -> origin/gh/eellison/882/head
2025-12-04T08:57:05.8336530Z  * [new branch]              gh/eellison/882/orig        -> origin/gh/eellison/882/orig
2025-12-04T08:57:05.8338531Z  * [new branch]              gh/eellison/883/base        -> origin/gh/eellison/883/base
2025-12-04T08:57:05.8340035Z  * [new branch]              gh/eellison/883/head        -> origin/gh/eellison/883/head
2025-12-04T08:57:05.8341566Z  * [new branch]              gh/eellison/883/orig        -> origin/gh/eellison/883/orig
2025-12-04T08:57:05.8343870Z  * [new branch]              gh/eellison/884/base        -> origin/gh/eellison/884/base
2025-12-04T08:57:05.8345388Z  * [new branch]              gh/eellison/884/head        -> origin/gh/eellison/884/head
2025-12-04T08:57:05.8346924Z  * [new branch]              gh/eellison/884/orig        -> origin/gh/eellison/884/orig
2025-12-04T08:57:05.8349480Z  * [new branch]              gh/etaf/147/base            -> origin/gh/etaf/147/base
2025-12-04T08:57:05.8351013Z  * [new branch]              gh/etaf/147/head            -> origin/gh/etaf/147/head
2025-12-04T08:57:05.8353349Z  * [new branch]              gh/etaf/154/base            -> origin/gh/etaf/154/base
2025-12-04T08:57:05.8354973Z  * [new branch]              gh/etaf/154/head            -> origin/gh/etaf/154/head
2025-12-04T08:57:05.8356565Z  * [new branch]              gh/etaf/154/orig            -> origin/gh/etaf/154/orig
2025-12-04T08:57:05.8358626Z  * [new branch]              gh/etaf/156/base            -> origin/gh/etaf/156/base
2025-12-04T08:57:05.8360381Z  * [new branch]              gh/etaf/156/head            -> origin/gh/etaf/156/head
2025-12-04T08:57:05.8362028Z  * [new branch]              gh/etaf/156/orig            -> origin/gh/etaf/156/orig
2025-12-04T08:57:05.8364367Z  * [new branch]              gh/etaf/157/base            -> origin/gh/etaf/157/base
2025-12-04T08:57:05.8365986Z  * [new branch]              gh/etaf/157/head            -> origin/gh/etaf/157/head
2025-12-04T08:57:05.8367583Z  * [new branch]              gh/etaf/157/orig            -> origin/gh/etaf/157/orig
2025-12-04T08:57:05.8369985Z  * [new branch]              gh/etaf/158/base            -> origin/gh/etaf/158/base
2025-12-04T08:57:05.8371669Z  * [new branch]              gh/etaf/158/head            -> origin/gh/etaf/158/head
2025-12-04T08:57:05.8373311Z  * [new branch]              gh/etaf/158/orig            -> origin/gh/etaf/158/orig
2025-12-04T08:57:05.8375547Z  * [new branch]              gh/etaf/159/base            -> origin/gh/etaf/159/base
2025-12-04T08:57:05.8377590Z  * [new branch]              gh/etaf/159/head            -> origin/gh/etaf/159/head
2025-12-04T08:57:05.8379193Z  * [new branch]              gh/etaf/159/orig            -> origin/gh/etaf/159/orig
2025-12-04T08:57:05.8381706Z  * [new branch]              gh/etaf/160/base            -> origin/gh/etaf/160/base
2025-12-04T08:57:05.8383381Z  * [new branch]              gh/etaf/160/head            -> origin/gh/etaf/160/head
2025-12-04T08:57:05.8385014Z  * [new branch]              gh/etaf/160/orig            -> origin/gh/etaf/160/orig
2025-12-04T08:57:05.8387258Z  * [new branch]              gh/etaf/161/base            -> origin/gh/etaf/161/base
2025-12-04T08:57:05.8388917Z  * [new branch]              gh/etaf/161/head            -> origin/gh/etaf/161/head
2025-12-04T08:57:05.8390646Z  * [new branch]              gh/etaf/161/orig            -> origin/gh/etaf/161/orig
2025-12-04T08:57:05.8392735Z  * [new branch]              gh/etaf/166/base            -> origin/gh/etaf/166/base
2025-12-04T08:57:05.8394472Z  * [new branch]              gh/etaf/166/head            -> origin/gh/etaf/166/head
2025-12-04T08:57:05.8396057Z  * [new branch]              gh/etaf/166/orig            -> origin/gh/etaf/166/orig
2025-12-04T08:57:05.8398120Z  * [new branch]              gh/etaf/167/base            -> origin/gh/etaf/167/base
2025-12-04T08:57:05.8399729Z  * [new branch]              gh/etaf/167/head            -> origin/gh/etaf/167/head
2025-12-04T08:57:05.8401455Z  * [new branch]              gh/etaf/167/orig            -> origin/gh/etaf/167/orig
2025-12-04T08:57:05.8403845Z  * [new branch]              gh/etaf/168/base            -> origin/gh/etaf/168/base
2025-12-04T08:57:05.8405478Z  * [new branch]              gh/etaf/168/head            -> origin/gh/etaf/168/head
2025-12-04T08:57:05.8407081Z  * [new branch]              gh/etaf/168/orig            -> origin/gh/etaf/168/orig
2025-12-04T08:57:05.8409298Z  * [new branch]              gh/etaf/172/base            -> origin/gh/etaf/172/base
2025-12-04T08:57:05.8410982Z  * [new branch]              gh/etaf/172/head            -> origin/gh/etaf/172/head
2025-12-04T08:57:05.8412760Z  * [new branch]              gh/etaf/172/orig            -> origin/gh/etaf/172/orig
2025-12-04T08:57:05.8414927Z  * [new branch]              gh/etaf/173/base            -> origin/gh/etaf/173/base
2025-12-04T08:57:05.8416566Z  * [new branch]              gh/etaf/173/head            -> origin/gh/etaf/173/head
2025-12-04T08:57:05.8418383Z  * [new branch]              gh/etaf/173/orig            -> origin/gh/etaf/173/orig
2025-12-04T08:57:05.8420570Z  * [new branch]              gh/etaf/174/base            -> origin/gh/etaf/174/base
2025-12-04T08:57:05.8422150Z  * [new branch]              gh/etaf/174/head            -> origin/gh/etaf/174/head
2025-12-04T08:57:05.8424300Z  * [new branch]              gh/etaf/175/base            -> origin/gh/etaf/175/base
2025-12-04T08:57:05.8425881Z  * [new branch]              gh/etaf/175/head            -> origin/gh/etaf/175/head
2025-12-04T08:57:05.8427424Z  * [new branch]              gh/etaf/175/orig            -> origin/gh/etaf/175/orig
2025-12-04T08:57:05.8429657Z  * [new branch]              gh/etaf/176/base            -> origin/gh/etaf/176/base
2025-12-04T08:57:05.8431323Z  * [new branch]              gh/etaf/176/head            -> origin/gh/etaf/176/head
2025-12-04T08:57:05.8432928Z  * [new branch]              gh/etaf/176/orig            -> origin/gh/etaf/176/orig
2025-12-04T08:57:05.8435523Z  * [new branch]              gh/etaf/177/base            -> origin/gh/etaf/177/base
2025-12-04T08:57:05.8437275Z  * [new branch]              gh/etaf/177/head            -> origin/gh/etaf/177/head
2025-12-04T08:57:05.8438904Z  * [new branch]              gh/etaf/177/orig            -> origin/gh/etaf/177/orig
2025-12-04T08:57:05.8441366Z  * [new branch]              gh/etaf/178/base            -> origin/gh/etaf/178/base
2025-12-04T08:57:05.8443004Z  * [new branch]              gh/etaf/178/head            -> origin/gh/etaf/178/head
2025-12-04T08:57:05.8444668Z  * [new branch]              gh/etaf/178/orig            -> origin/gh/etaf/178/orig
2025-12-04T08:57:05.8446874Z  * [new branch]              gh/etaf/179/base            -> origin/gh/etaf/179/base
2025-12-04T08:57:05.8448466Z  * [new branch]              gh/etaf/179/head            -> origin/gh/etaf/179/head
2025-12-04T08:57:05.8450012Z  * [new branch]              gh/etaf/179/orig            -> origin/gh/etaf/179/orig
2025-12-04T08:57:05.8452321Z  * [new branch]              gh/etaf/180/base            -> origin/gh/etaf/180/base
2025-12-04T08:57:05.8454237Z  * [new branch]              gh/etaf/180/head            -> origin/gh/etaf/180/head
2025-12-04T08:57:05.8455809Z  * [new branch]              gh/etaf/180/orig            -> origin/gh/etaf/180/orig
2025-12-04T08:57:05.8458433Z  * [new branch]              gh/exclamaforte/1/base      -> origin/gh/exclamaforte/1/base
2025-12-04T08:57:05.8460522Z  * [new branch]              gh/exclamaforte/1/head      -> origin/gh/exclamaforte/1/head
2025-12-04T08:57:05.8463514Z  * [new branch]              gh/exclamaforte/2/base      -> origin/gh/exclamaforte/2/base
2025-12-04T08:57:05.8465008Z  * [new branch]              gh/exclamaforte/2/head      -> origin/gh/exclamaforte/2/head
2025-12-04T08:57:05.8467197Z  * [new branch]              gh/exclamaforte/3/base      -> origin/gh/exclamaforte/3/base
2025-12-04T08:57:05.8468896Z  * [new branch]              gh/exclamaforte/3/head      -> origin/gh/exclamaforte/3/head
2025-12-04T08:57:05.8471048Z  * [new branch]              gh/exclamaforte/4/base      -> origin/gh/exclamaforte/4/base
2025-12-04T08:57:05.8472684Z  * [new branch]              gh/exclamaforte/4/head      -> origin/gh/exclamaforte/4/head
2025-12-04T08:57:05.8475513Z  * [new branch]              gh/ezyang/2374/base         -> origin/gh/ezyang/2374/base
2025-12-04T08:57:05.8477129Z  * [new branch]              gh/ezyang/2374/head         -> origin/gh/ezyang/2374/head
2025-12-04T08:57:05.8478723Z  * [new branch]              gh/ezyang/2374/orig         -> origin/gh/ezyang/2374/orig
2025-12-04T08:57:05.8481129Z  * [new branch]              gh/ezyang/2973/base         -> origin/gh/ezyang/2973/base
2025-12-04T08:57:05.8482607Z  * [new branch]              gh/ezyang/2973/head         -> origin/gh/ezyang/2973/head
2025-12-04T08:57:05.8484179Z  * [new branch]              gh/ezyang/2973/orig         -> origin/gh/ezyang/2973/orig
2025-12-04T08:57:05.8486287Z  * [new branch]              gh/ezyang/2974/base         -> origin/gh/ezyang/2974/base
2025-12-04T08:57:05.8487843Z  * [new branch]              gh/ezyang/2974/head         -> origin/gh/ezyang/2974/head
2025-12-04T08:57:05.8489321Z  * [new branch]              gh/ezyang/2974/orig         -> origin/gh/ezyang/2974/orig
2025-12-04T08:57:05.8491444Z  * [new branch]              gh/ezyang/3131/base         -> origin/gh/ezyang/3131/base
2025-12-04T08:57:05.8493063Z  * [new branch]              gh/ezyang/3131/head         -> origin/gh/ezyang/3131/head
2025-12-04T08:57:05.8494737Z  * [new branch]              gh/ezyang/3131/orig         -> origin/gh/ezyang/3131/orig
2025-12-04T08:57:05.8496841Z  * [new branch]              gh/ezyang/3139/base         -> origin/gh/ezyang/3139/base
2025-12-04T08:57:05.8498469Z  * [new branch]              gh/ezyang/3139/head         -> origin/gh/ezyang/3139/head
2025-12-04T08:57:05.8500012Z  * [new branch]              gh/ezyang/3139/orig         -> origin/gh/ezyang/3139/orig
2025-12-04T08:57:05.8502059Z  * [new branch]              gh/ezyang/3140/base         -> origin/gh/ezyang/3140/base
2025-12-04T08:57:05.8503609Z  * [new branch]              gh/ezyang/3140/head         -> origin/gh/ezyang/3140/head
2025-12-04T08:57:05.8505209Z  * [new branch]              gh/ezyang/3140/orig         -> origin/gh/ezyang/3140/orig
2025-12-04T08:57:05.8507345Z  * [new branch]              gh/ezyang/3143/base         -> origin/gh/ezyang/3143/base
2025-12-04T08:57:05.8508913Z  * [new branch]              gh/ezyang/3143/head         -> origin/gh/ezyang/3143/head
2025-12-04T08:57:05.8510544Z  * [new branch]              gh/ezyang/3143/orig         -> origin/gh/ezyang/3143/orig
2025-12-04T08:57:05.8512680Z  * [new branch]              gh/ezyang/3144/base         -> origin/gh/ezyang/3144/base
2025-12-04T08:57:05.8514268Z  * [new branch]              gh/ezyang/3144/head         -> origin/gh/ezyang/3144/head
2025-12-04T08:57:05.8515969Z  * [new branch]              gh/ezyang/3144/orig         -> origin/gh/ezyang/3144/orig
2025-12-04T08:57:05.8518347Z  * [new branch]              gh/ezyang/3167/base         -> origin/gh/ezyang/3167/base
2025-12-04T08:57:05.8519961Z  * [new branch]              gh/ezyang/3167/head         -> origin/gh/ezyang/3167/head
2025-12-04T08:57:05.8521622Z  * [new branch]              gh/ezyang/3167/orig         -> origin/gh/ezyang/3167/orig
2025-12-04T08:57:05.8523757Z  * [new branch]              gh/ezyang/3173/base         -> origin/gh/ezyang/3173/base
2025-12-04T08:57:05.8525313Z  * [new branch]              gh/ezyang/3173/head         -> origin/gh/ezyang/3173/head
2025-12-04T08:57:05.8527003Z  * [new branch]              gh/ezyang/3173/orig         -> origin/gh/ezyang/3173/orig
2025-12-04T08:57:05.8529102Z  * [new branch]              gh/ezyang/3175/base         -> origin/gh/ezyang/3175/base
2025-12-04T08:57:05.8530779Z  * [new branch]              gh/ezyang/3175/head         -> origin/gh/ezyang/3175/head
2025-12-04T08:57:05.8532280Z  * [new branch]              gh/ezyang/3175/orig         -> origin/gh/ezyang/3175/orig
2025-12-04T08:57:05.8534433Z  * [new branch]              gh/ezyang/3182/base         -> origin/gh/ezyang/3182/base
2025-12-04T08:57:05.8536055Z  * [new branch]              gh/ezyang/3182/head         -> origin/gh/ezyang/3182/head
2025-12-04T08:57:05.8537680Z  * [new branch]              gh/ezyang/3182/orig         -> origin/gh/ezyang/3182/orig
2025-12-04T08:57:05.8539741Z  * [new branch]              gh/ezyang/3185/base         -> origin/gh/ezyang/3185/base
2025-12-04T08:57:05.8541402Z  * [new branch]              gh/ezyang/3185/head         -> origin/gh/ezyang/3185/head
2025-12-04T08:57:05.8542958Z  * [new branch]              gh/ezyang/3185/orig         -> origin/gh/ezyang/3185/orig
2025-12-04T08:57:05.8545187Z  * [new branch]              gh/ezyang/3189/base         -> origin/gh/ezyang/3189/base
2025-12-04T08:57:05.8546658Z  * [new branch]              gh/ezyang/3189/head         -> origin/gh/ezyang/3189/head
2025-12-04T08:57:05.8548197Z  * [new branch]              gh/ezyang/3189/orig         -> origin/gh/ezyang/3189/orig
2025-12-04T08:57:05.8550230Z  * [new branch]              gh/ezyang/3191/base         -> origin/gh/ezyang/3191/base
2025-12-04T08:57:05.8551813Z  * [new branch]              gh/ezyang/3191/head         -> origin/gh/ezyang/3191/head
2025-12-04T08:57:05.8553446Z  * [new branch]              gh/ezyang/3191/orig         -> origin/gh/ezyang/3191/orig
2025-12-04T08:57:05.8556011Z  * [new branch]              gh/ezyang/3192/base         -> origin/gh/ezyang/3192/base
2025-12-04T08:57:05.8557571Z  * [new branch]              gh/ezyang/3192/head         -> origin/gh/ezyang/3192/head
2025-12-04T08:57:05.8559270Z  * [new branch]              gh/ezyang/3192/orig         -> origin/gh/ezyang/3192/orig
2025-12-04T08:57:05.8561570Z  * [new branch]              gh/ezyang/3193/base         -> origin/gh/ezyang/3193/base
2025-12-04T08:57:05.8563116Z  * [new branch]              gh/ezyang/3193/head         -> origin/gh/ezyang/3193/head
2025-12-04T08:57:05.8564749Z  * [new branch]              gh/ezyang/3193/orig         -> origin/gh/ezyang/3193/orig
2025-12-04T08:57:05.8567433Z  * [new branch]              gh/ezyang/3194/base         -> origin/gh/ezyang/3194/base
2025-12-04T08:57:05.8568995Z  * [new branch]              gh/ezyang/3194/head         -> origin/gh/ezyang/3194/head
2025-12-04T08:57:05.8570654Z  * [new branch]              gh/ezyang/3194/orig         -> origin/gh/ezyang/3194/orig
2025-12-04T08:57:05.8572803Z  * [new branch]              gh/ezyang/3195/base         -> origin/gh/ezyang/3195/base
2025-12-04T08:57:05.8574442Z  * [new branch]              gh/ezyang/3195/head         -> origin/gh/ezyang/3195/head
2025-12-04T08:57:05.8576032Z  * [new branch]              gh/ezyang/3195/orig         -> origin/gh/ezyang/3195/orig
2025-12-04T08:57:05.8578218Z  * [new branch]              gh/ezyang/3196/base         -> origin/gh/ezyang/3196/base
2025-12-04T08:57:05.8579804Z  * [new branch]              gh/ezyang/3196/head         -> origin/gh/ezyang/3196/head
2025-12-04T08:57:05.8581449Z  * [new branch]              gh/ezyang/3196/orig         -> origin/gh/ezyang/3196/orig
2025-12-04T08:57:05.8583752Z  * [new branch]              gh/ezyang/3197/base         -> origin/gh/ezyang/3197/base
2025-12-04T08:57:05.8585344Z  * [new branch]              gh/ezyang/3197/head         -> origin/gh/ezyang/3197/head
2025-12-04T08:57:05.8586941Z  * [new branch]              gh/ezyang/3197/orig         -> origin/gh/ezyang/3197/orig
2025-12-04T08:57:05.8589102Z  * [new branch]              gh/ezyang/3198/base         -> origin/gh/ezyang/3198/base
2025-12-04T08:57:05.8590730Z  * [new branch]              gh/ezyang/3198/head         -> origin/gh/ezyang/3198/head
2025-12-04T08:57:05.8592359Z  * [new branch]              gh/ezyang/3198/orig         -> origin/gh/ezyang/3198/orig
2025-12-04T08:57:05.8594585Z  * [new branch]              gh/ezyang/3199/base         -> origin/gh/ezyang/3199/base
2025-12-04T08:57:05.8596165Z  * [new branch]              gh/ezyang/3199/head         -> origin/gh/ezyang/3199/head
2025-12-04T08:57:05.8597813Z  * [new branch]              gh/ezyang/3199/orig         -> origin/gh/ezyang/3199/orig
2025-12-04T08:57:05.8599982Z  * [new branch]              gh/ezyang/3200/base         -> origin/gh/ezyang/3200/base
2025-12-04T08:57:05.8601602Z  * [new branch]              gh/ezyang/3200/head         -> origin/gh/ezyang/3200/head
2025-12-04T08:57:05.8603232Z  * [new branch]              gh/ezyang/3200/orig         -> origin/gh/ezyang/3200/orig
2025-12-04T08:57:05.8605477Z  * [new branch]              gh/ezyang/3201/base         -> origin/gh/ezyang/3201/base
2025-12-04T08:57:05.8607031Z  * [new branch]              gh/ezyang/3201/head         -> origin/gh/ezyang/3201/head
2025-12-04T08:57:05.8608644Z  * [new branch]              gh/ezyang/3201/orig         -> origin/gh/ezyang/3201/orig
2025-12-04T08:57:05.8610961Z  * [new branch]              gh/ezyang/3202/base         -> origin/gh/ezyang/3202/base
2025-12-04T08:57:05.8612302Z  * [new branch]              gh/ezyang/3202/head         -> origin/gh/ezyang/3202/head
2025-12-04T08:57:05.8613838Z  * [new branch]              gh/ezyang/3202/orig         -> origin/gh/ezyang/3202/orig
2025-12-04T08:57:05.8616013Z  * [new branch]              gh/ezyang/3203/base         -> origin/gh/ezyang/3203/base
2025-12-04T08:57:05.8617763Z  * [new branch]              gh/ezyang/3203/head         -> origin/gh/ezyang/3203/head
2025-12-04T08:57:05.8621192Z  * [new branch]              gh/ezyang/3203/orig         -> origin/gh/ezyang/3203/orig
2025-12-04T08:57:05.8623690Z  * [new branch]              gh/ezyang/3204/base         -> origin/gh/ezyang/3204/base
2025-12-04T08:57:05.8625306Z  * [new branch]              gh/ezyang/3204/head         -> origin/gh/ezyang/3204/head
2025-12-04T08:57:05.8626926Z  * [new branch]              gh/ezyang/3204/orig         -> origin/gh/ezyang/3204/orig
2025-12-04T08:57:05.8629122Z  * [new branch]              gh/ezyang/3205/base         -> origin/gh/ezyang/3205/base
2025-12-04T08:57:05.8630657Z  * [new branch]              gh/ezyang/3205/head         -> origin/gh/ezyang/3205/head
2025-12-04T08:57:05.8632252Z  * [new branch]              gh/ezyang/3205/orig         -> origin/gh/ezyang/3205/orig
2025-12-04T08:57:05.8634696Z  * [new branch]              gh/ezyang/3206/base         -> origin/gh/ezyang/3206/base
2025-12-04T08:57:05.8636285Z  * [new branch]              gh/ezyang/3206/head         -> origin/gh/ezyang/3206/head
2025-12-04T08:57:05.8637947Z  * [new branch]              gh/ezyang/3206/orig         -> origin/gh/ezyang/3206/orig
2025-12-04T08:57:05.8640183Z  * [new branch]              gh/ezyang/3207/base         -> origin/gh/ezyang/3207/base
2025-12-04T08:57:05.8641804Z  * [new branch]              gh/ezyang/3207/head         -> origin/gh/ezyang/3207/head
2025-12-04T08:57:05.8643463Z  * [new branch]              gh/ezyang/3207/orig         -> origin/gh/ezyang/3207/orig
2025-12-04T08:57:05.8645648Z  * [new branch]              gh/ezyang/3208/base         -> origin/gh/ezyang/3208/base
2025-12-04T08:57:05.8647171Z  * [new branch]              gh/ezyang/3208/head         -> origin/gh/ezyang/3208/head
2025-12-04T08:57:05.8648858Z  * [new branch]              gh/ezyang/3208/orig         -> origin/gh/ezyang/3208/orig
2025-12-04T08:57:05.8651201Z  * [new branch]              gh/ezyang/3209/base         -> origin/gh/ezyang/3209/base
2025-12-04T08:57:05.8652691Z  * [new branch]              gh/ezyang/3209/head         -> origin/gh/ezyang/3209/head
2025-12-04T08:57:05.8654348Z  * [new branch]              gh/ezyang/3209/orig         -> origin/gh/ezyang/3209/orig
2025-12-04T08:57:05.8656878Z  * [new branch]              gh/fadara01/3/base          -> origin/gh/fadara01/3/base
2025-12-04T08:57:05.8658435Z  * [new branch]              gh/fadara01/3/head          -> origin/gh/fadara01/3/head
2025-12-04T08:57:05.8659982Z  * [new branch]              gh/fadara01/3/orig          -> origin/gh/fadara01/3/orig
2025-12-04T08:57:05.8662199Z  * [new branch]              gh/fadara01/5/base          -> origin/gh/fadara01/5/base
2025-12-04T08:57:05.8663783Z  * [new branch]              gh/fadara01/5/head          -> origin/gh/fadara01/5/head
2025-12-04T08:57:05.8665393Z  * [new branch]              gh/fadara01/5/orig          -> origin/gh/fadara01/5/orig
2025-12-04T08:57:05.8667445Z  * [new branch]              gh/fadara01/6/base          -> origin/gh/fadara01/6/base
2025-12-04T08:57:05.8669094Z  * [new branch]              gh/fadara01/6/head          -> origin/gh/fadara01/6/head
2025-12-04T08:57:05.8670652Z  * [new branch]              gh/fadara01/6/orig          -> origin/gh/fadara01/6/orig
2025-12-04T08:57:05.8672807Z  * [new branch]              gh/fadara01/7/base          -> origin/gh/fadara01/7/base
2025-12-04T08:57:05.8674593Z  * [new branch]              gh/fadara01/7/head          -> origin/gh/fadara01/7/head
2025-12-04T08:57:05.8676047Z  * [new branch]              gh/fadara01/7/orig          -> origin/gh/fadara01/7/orig
2025-12-04T08:57:05.8678192Z  * [new branch]              gh/fadara01/8/base          -> origin/gh/fadara01/8/base
2025-12-04T08:57:05.8679706Z  * [new branch]              gh/fadara01/8/head          -> origin/gh/fadara01/8/head
2025-12-04T08:57:05.8681512Z  * [new branch]              gh/fadara01/8/orig          -> origin/gh/fadara01/8/orig
2025-12-04T08:57:05.8683551Z  * [new branch]              gh/fadara01/9/base          -> origin/gh/fadara01/9/base
2025-12-04T08:57:05.8685139Z  * [new branch]              gh/fadara01/9/head          -> origin/gh/fadara01/9/head
2025-12-04T08:57:05.8686745Z  * [new branch]              gh/fadara01/9/orig          -> origin/gh/fadara01/9/orig
2025-12-04T08:57:05.8689389Z  * [new branch]              gh/fduwjj/182/base          -> origin/gh/fduwjj/182/base
2025-12-04T08:57:05.8690960Z  * [new branch]              gh/fduwjj/182/head          -> origin/gh/fduwjj/182/head
2025-12-04T08:57:05.8692521Z  * [new branch]              gh/fduwjj/182/orig          -> origin/gh/fduwjj/182/orig
2025-12-04T08:57:05.8694690Z  * [new branch]              gh/fduwjj/211/base          -> origin/gh/fduwjj/211/base
2025-12-04T08:57:05.8696760Z  * [new branch]              gh/fduwjj/211/head          -> origin/gh/fduwjj/211/head
2025-12-04T08:57:05.8698354Z  * [new branch]              gh/fduwjj/211/orig          -> origin/gh/fduwjj/211/orig
2025-12-04T08:57:05.8700517Z  * [new branch]              gh/fduwjj/212/base          -> origin/gh/fduwjj/212/base
2025-12-04T08:57:05.8702153Z  * [new branch]              gh/fduwjj/212/head          -> origin/gh/fduwjj/212/head
2025-12-04T08:57:05.8703711Z  * [new branch]              gh/fduwjj/212/orig          -> origin/gh/fduwjj/212/orig
2025-12-04T08:57:05.8705810Z  * [new branch]              gh/fduwjj/213/base          -> origin/gh/fduwjj/213/base
2025-12-04T08:57:05.8707434Z  * [new branch]              gh/fduwjj/213/head          -> origin/gh/fduwjj/213/head
2025-12-04T08:57:05.8709098Z  * [new branch]              gh/fduwjj/213/orig          -> origin/gh/fduwjj/213/orig
2025-12-04T08:57:05.8711370Z  * [new branch]              gh/fduwjj/226/base          -> origin/gh/fduwjj/226/base
2025-12-04T08:57:05.8712910Z  * [new branch]              gh/fduwjj/226/head          -> origin/gh/fduwjj/226/head
2025-12-04T08:57:05.8714479Z  * [new branch]              gh/fduwjj/226/orig          -> origin/gh/fduwjj/226/orig
2025-12-04T08:57:05.8716786Z  * [new branch]              gh/fduwjj/229/base          -> origin/gh/fduwjj/229/base
2025-12-04T08:57:05.8718582Z  * [new branch]              gh/fduwjj/229/head          -> origin/gh/fduwjj/229/head
2025-12-04T08:57:05.8720170Z  * [new branch]              gh/fduwjj/229/orig          -> origin/gh/fduwjj/229/orig
2025-12-04T08:57:05.8722309Z  * [new branch]              gh/fduwjj/233/base          -> origin/gh/fduwjj/233/base
2025-12-04T08:57:05.8723904Z  * [new branch]              gh/fduwjj/233/head          -> origin/gh/fduwjj/233/head
2025-12-04T08:57:05.8725459Z  * [new branch]              gh/fduwjj/233/orig          -> origin/gh/fduwjj/233/orig
2025-12-04T08:57:05.8727633Z  * [new branch]              gh/fduwjj/234/base          -> origin/gh/fduwjj/234/base
2025-12-04T08:57:05.8729248Z  * [new branch]              gh/fduwjj/234/head          -> origin/gh/fduwjj/234/head
2025-12-04T08:57:05.8730848Z  * [new branch]              gh/fduwjj/234/orig          -> origin/gh/fduwjj/234/orig
2025-12-04T08:57:05.8733066Z  * [new branch]              gh/fduwjj/235/base          -> origin/gh/fduwjj/235/base
2025-12-04T08:57:05.8734605Z  * [new branch]              gh/fduwjj/235/head          -> origin/gh/fduwjj/235/head
2025-12-04T08:57:05.8736234Z  * [new branch]              gh/fduwjj/235/orig          -> origin/gh/fduwjj/235/orig
2025-12-04T08:57:05.8738207Z  * [new branch]              gh/fduwjj/236/base          -> origin/gh/fduwjj/236/base
2025-12-04T08:57:05.8739931Z  * [new branch]              gh/fduwjj/236/head          -> origin/gh/fduwjj/236/head
2025-12-04T08:57:05.8741381Z  * [new branch]              gh/fduwjj/236/orig          -> origin/gh/fduwjj/236/orig
2025-12-04T08:57:05.8743344Z  * [new branch]              gh/fduwjj/237/base          -> origin/gh/fduwjj/237/base
2025-12-04T08:57:05.8744914Z  * [new branch]              gh/fduwjj/237/head          -> origin/gh/fduwjj/237/head
2025-12-04T08:57:05.8746471Z  * [new branch]              gh/fduwjj/237/orig          -> origin/gh/fduwjj/237/orig
2025-12-04T08:57:05.8748602Z  * [new branch]              gh/fduwjj/238/base          -> origin/gh/fduwjj/238/base
2025-12-04T08:57:05.8750632Z  * [new branch]              gh/fduwjj/238/head          -> origin/gh/fduwjj/238/head
2025-12-04T08:57:05.8752286Z  * [new branch]              gh/fduwjj/238/orig          -> origin/gh/fduwjj/238/orig
2025-12-04T08:57:05.8754459Z  * [new branch]              gh/fduwjj/239/base          -> origin/gh/fduwjj/239/base
2025-12-04T08:57:05.8756231Z  * [new branch]              gh/fduwjj/239/head          -> origin/gh/fduwjj/239/head
2025-12-04T08:57:05.8757794Z  * [new branch]              gh/fduwjj/239/orig          -> origin/gh/fduwjj/239/orig
2025-12-04T08:57:05.8760368Z  * [new branch]              gh/fegin/332/base           -> origin/gh/fegin/332/base
2025-12-04T08:57:05.8762004Z  * [new branch]              gh/fegin/332/head           -> origin/gh/fegin/332/head
2025-12-04T08:57:05.8763707Z  * [new branch]              gh/fegin/332/orig           -> origin/gh/fegin/332/orig
2025-12-04T08:57:05.8765880Z  * [new branch]              gh/fegin/333/base           -> origin/gh/fegin/333/base
2025-12-04T08:57:05.8767492Z  * [new branch]              gh/fegin/333/head           -> origin/gh/fegin/333/head
2025-12-04T08:57:05.8769690Z  * [new branch]              gh/fegin/333/orig           -> origin/gh/fegin/333/orig
2025-12-04T08:57:05.8771842Z  * [new branch]              gh/fegin/334/base           -> origin/gh/fegin/334/base
2025-12-04T08:57:05.8773401Z  * [new branch]              gh/fegin/334/head           -> origin/gh/fegin/334/head
2025-12-04T08:57:05.8775101Z  * [new branch]              gh/fegin/334/orig           -> origin/gh/fegin/334/orig
2025-12-04T08:57:05.8777369Z  * [new branch]              gh/fegin/335/base           -> origin/gh/fegin/335/base
2025-12-04T08:57:05.8778998Z  * [new branch]              gh/fegin/335/head           -> origin/gh/fegin/335/head
2025-12-04T08:57:05.8780561Z  * [new branch]              gh/fegin/335/orig           -> origin/gh/fegin/335/orig
2025-12-04T08:57:05.8783077Z  * [new branch]              gh/fffrog/160/base          -> origin/gh/fffrog/160/base
2025-12-04T08:57:05.8784855Z  * [new branch]              gh/fffrog/160/head          -> origin/gh/fffrog/160/head
2025-12-04T08:57:05.8786936Z  * [new branch]              gh/fffrog/177/base          -> origin/gh/fffrog/177/base
2025-12-04T08:57:05.8788515Z  * [new branch]              gh/fffrog/177/head          -> origin/gh/fffrog/177/head
2025-12-04T08:57:05.8790023Z  * [new branch]              gh/fffrog/177/orig          -> origin/gh/fffrog/177/orig
2025-12-04T08:57:05.8792116Z  * [new branch]              gh/fffrog/178/base          -> origin/gh/fffrog/178/base
2025-12-04T08:57:05.8793768Z  * [new branch]              gh/fffrog/178/head          -> origin/gh/fffrog/178/head
2025-12-04T08:57:05.8795378Z  * [new branch]              gh/fffrog/178/orig          -> origin/gh/fffrog/178/orig
2025-12-04T08:57:05.8797973Z  * [new branch]              gh/fffrog/181/base          -> origin/gh/fffrog/181/base
2025-12-04T08:57:05.8800075Z  * [new branch]              gh/fffrog/181/head          -> origin/gh/fffrog/181/head
2025-12-04T08:57:05.8801791Z  * [new branch]              gh/fffrog/181/orig          -> origin/gh/fffrog/181/orig
2025-12-04T08:57:05.8803872Z  * [new branch]              gh/fffrog/183/base          -> origin/gh/fffrog/183/base
2025-12-04T08:57:05.8805509Z  * [new branch]              gh/fffrog/183/head          -> origin/gh/fffrog/183/head
2025-12-04T08:57:05.8807136Z  * [new branch]              gh/fffrog/183/orig          -> origin/gh/fffrog/183/orig
2025-12-04T08:57:05.8809734Z  * [new branch]              gh/fxdawnn/10/base          -> origin/gh/fxdawnn/10/base
2025-12-04T08:57:05.8811248Z  * [new branch]              gh/fxdawnn/10/head          -> origin/gh/fxdawnn/10/head
2025-12-04T08:57:05.8812841Z  * [new branch]              gh/fxdawnn/10/orig          -> origin/gh/fxdawnn/10/orig
2025-12-04T08:57:05.8815329Z  * [new branch]              gh/fxdawnn/11/base          -> origin/gh/fxdawnn/11/base
2025-12-04T08:57:05.8816779Z  * [new branch]              gh/fxdawnn/11/head          -> origin/gh/fxdawnn/11/head
2025-12-04T08:57:05.8818783Z  * [new branch]              gh/fxdawnn/11/orig          -> origin/gh/fxdawnn/11/orig
2025-12-04T08:57:05.8820868Z  * [new branch]              gh/fxdawnn/12/base          -> origin/gh/fxdawnn/12/base
2025-12-04T08:57:05.8822666Z  * [new branch]              gh/fxdawnn/12/head          -> origin/gh/fxdawnn/12/head
2025-12-04T08:57:05.8824115Z  * [new branch]              gh/fxdawnn/12/orig          -> origin/gh/fxdawnn/12/orig
2025-12-04T08:57:05.8826224Z  * [new branch]              gh/fxdawnn/13/base          -> origin/gh/fxdawnn/13/base
2025-12-04T08:57:05.8827830Z  * [new branch]              gh/fxdawnn/13/head          -> origin/gh/fxdawnn/13/head
2025-12-04T08:57:05.8829389Z  * [new branch]              gh/fxdawnn/13/orig          -> origin/gh/fxdawnn/13/orig
2025-12-04T08:57:05.8831624Z  * [new branch]              gh/fxdawnn/14/base          -> origin/gh/fxdawnn/14/base
2025-12-04T08:57:05.8833154Z  * [new branch]              gh/fxdawnn/14/head          -> origin/gh/fxdawnn/14/head
2025-12-04T08:57:05.8834759Z  * [new branch]              gh/fxdawnn/14/orig          -> origin/gh/fxdawnn/14/orig
2025-12-04T08:57:05.8836824Z  * [new branch]              gh/fxdawnn/15/base          -> origin/gh/fxdawnn/15/base
2025-12-04T08:57:05.8838422Z  * [new branch]              gh/fxdawnn/15/head          -> origin/gh/fxdawnn/15/head
2025-12-04T08:57:05.8840169Z  * [new branch]              gh/fxdawnn/15/orig          -> origin/gh/fxdawnn/15/orig
2025-12-04T08:57:05.8842323Z  * [new branch]              gh/fxdawnn/6/base           -> origin/gh/fxdawnn/6/base
2025-12-04T08:57:05.8843906Z  * [new branch]              gh/fxdawnn/6/head           -> origin/gh/fxdawnn/6/head
2025-12-04T08:57:05.8845516Z  * [new branch]              gh/fxdawnn/6/orig           -> origin/gh/fxdawnn/6/orig
2025-12-04T08:57:05.8847652Z  * [new branch]              gh/fxdawnn/7/base           -> origin/gh/fxdawnn/7/base
2025-12-04T08:57:05.8849301Z  * [new branch]              gh/fxdawnn/7/head           -> origin/gh/fxdawnn/7/head
2025-12-04T08:57:05.8850966Z  * [new branch]              gh/fxdawnn/7/orig           -> origin/gh/fxdawnn/7/orig
2025-12-04T08:57:05.8853164Z  * [new branch]              gh/fxdawnn/9/base           -> origin/gh/fxdawnn/9/base
2025-12-04T08:57:05.8854636Z  * [new branch]              gh/fxdawnn/9/head           -> origin/gh/fxdawnn/9/head
2025-12-04T08:57:05.8856183Z  * [new branch]              gh/fxdawnn/9/orig           -> origin/gh/fxdawnn/9/orig
2025-12-04T08:57:05.8859080Z  * [new branch]              gh/galv/1/base              -> origin/gh/galv/1/base
2025-12-04T08:57:05.8860363Z  * [new branch]              gh/galv/1/head              -> origin/gh/galv/1/head
2025-12-04T08:57:05.8861946Z  * [new branch]              gh/galv/1/orig              -> origin/gh/galv/1/orig
2025-12-04T08:57:05.8864079Z  * [new branch]              gh/galv/2/base              -> origin/gh/galv/2/base
2025-12-04T08:57:05.8865682Z  * [new branch]              gh/galv/2/head              -> origin/gh/galv/2/head
2025-12-04T08:57:05.8867336Z  * [new branch]              gh/galv/2/orig              -> origin/gh/galv/2/orig
2025-12-04T08:57:05.8869519Z  * [new branch]              gh/galv/3/base              -> origin/gh/galv/3/base
2025-12-04T08:57:05.8871243Z  * [new branch]              gh/galv/3/head              -> origin/gh/galv/3/head
2025-12-04T08:57:05.8873075Z  * [new branch]              gh/galv/3/orig              -> origin/gh/galv/3/orig
2025-12-04T08:57:05.8875708Z  * [new branch]              gh/guangyey/134/base        -> origin/gh/guangyey/134/base
2025-12-04T08:57:05.8877265Z  * [new branch]              gh/guangyey/134/head        -> origin/gh/guangyey/134/head
2025-12-04T08:57:05.8878845Z  * [new branch]              gh/guangyey/134/orig        -> origin/gh/guangyey/134/orig
2025-12-04T08:57:05.8881150Z  * [new branch]              gh/guangyey/163/base        -> origin/gh/guangyey/163/base
2025-12-04T08:57:05.8882733Z  * [new branch]              gh/guangyey/163/head        -> origin/gh/guangyey/163/head
2025-12-04T08:57:05.8884308Z  * [new branch]              gh/guangyey/163/orig        -> origin/gh/guangyey/163/orig
2025-12-04T08:57:05.8886408Z  * [new branch]              gh/guangyey/168/base        -> origin/gh/guangyey/168/base
2025-12-04T08:57:05.8888034Z  * [new branch]              gh/guangyey/168/head        -> origin/gh/guangyey/168/head
2025-12-04T08:57:05.8889632Z  * [new branch]              gh/guangyey/168/orig        -> origin/gh/guangyey/168/orig
2025-12-04T08:57:05.8891743Z  * [new branch]              gh/guangyey/169/base        -> origin/gh/guangyey/169/base
2025-12-04T08:57:05.8893353Z  * [new branch]              gh/guangyey/169/head        -> origin/gh/guangyey/169/head
2025-12-04T08:57:05.8894926Z  * [new branch]              gh/guangyey/169/orig        -> origin/gh/guangyey/169/orig
2025-12-04T08:57:05.8897509Z  * [new branch]              gh/guangyey/170/base        -> origin/gh/guangyey/170/base
2025-12-04T08:57:05.8899099Z  * [new branch]              gh/guangyey/170/head        -> origin/gh/guangyey/170/head
2025-12-04T08:57:05.8900846Z  * [new branch]              gh/guangyey/170/orig        -> origin/gh/guangyey/170/orig
2025-12-04T08:57:05.8903008Z  * [new branch]              gh/guangyey/171/base        -> origin/gh/guangyey/171/base
2025-12-04T08:57:05.8904546Z  * [new branch]              gh/guangyey/171/head        -> origin/gh/guangyey/171/head
2025-12-04T08:57:05.8906169Z  * [new branch]              gh/guangyey/171/orig        -> origin/gh/guangyey/171/orig
2025-12-04T08:57:05.8908314Z  * [new branch]              gh/guangyey/178/base        -> origin/gh/guangyey/178/base
2025-12-04T08:57:05.8909816Z  * [new branch]              gh/guangyey/178/head        -> origin/gh/guangyey/178/head
2025-12-04T08:57:05.8911438Z  * [new branch]              gh/guangyey/178/orig        -> origin/gh/guangyey/178/orig
2025-12-04T08:57:05.8913525Z  * [new branch]              gh/guangyey/182/base        -> origin/gh/guangyey/182/base
2025-12-04T08:57:05.8915144Z  * [new branch]              gh/guangyey/182/head        -> origin/gh/guangyey/182/head
2025-12-04T08:57:05.8916701Z  * [new branch]              gh/guangyey/182/orig        -> origin/gh/guangyey/182/orig
2025-12-04T08:57:05.8919050Z  * [new branch]              gh/guangyey/183/base        -> origin/gh/guangyey/183/base
2025-12-04T08:57:05.8920643Z  * [new branch]              gh/guangyey/183/head        -> origin/gh/guangyey/183/head
2025-12-04T08:57:05.8922290Z  * [new branch]              gh/guangyey/183/orig        -> origin/gh/guangyey/183/orig
2025-12-04T08:57:05.8924499Z  * [new branch]              gh/guangyey/185/base        -> origin/gh/guangyey/185/base
2025-12-04T08:57:05.8926124Z  * [new branch]              gh/guangyey/185/head        -> origin/gh/guangyey/185/head
2025-12-04T08:57:05.8927774Z  * [new branch]              gh/guangyey/185/orig        -> origin/gh/guangyey/185/orig
2025-12-04T08:57:05.8929880Z  * [new branch]              gh/guangyey/186/base        -> origin/gh/guangyey/186/base
2025-12-04T08:57:05.8931501Z  * [new branch]              gh/guangyey/186/head        -> origin/gh/guangyey/186/head
2025-12-04T08:57:05.8933000Z  * [new branch]              gh/guangyey/186/orig        -> origin/gh/guangyey/186/orig
2025-12-04T08:57:05.8935142Z  * [new branch]              gh/guangyey/187/base        -> origin/gh/guangyey/187/base
2025-12-04T08:57:05.8936852Z  * [new branch]              gh/guangyey/187/head        -> origin/gh/guangyey/187/head
2025-12-04T08:57:05.8938311Z  * [new branch]              gh/guangyey/187/orig        -> origin/gh/guangyey/187/orig
2025-12-04T08:57:05.8940388Z  * [new branch]              gh/guangyey/188/base        -> origin/gh/guangyey/188/base
2025-12-04T08:57:05.8941937Z  * [new branch]              gh/guangyey/188/head        -> origin/gh/guangyey/188/head
2025-12-04T08:57:05.8943513Z  * [new branch]              gh/guangyey/188/orig        -> origin/gh/guangyey/188/orig
2025-12-04T08:57:05.8945690Z  * [new branch]              gh/guangyey/190/base        -> origin/gh/guangyey/190/base
2025-12-04T08:57:05.8947257Z  * [new branch]              gh/guangyey/190/head        -> origin/gh/guangyey/190/head
2025-12-04T08:57:05.8948920Z  * [new branch]              gh/guangyey/190/orig        -> origin/gh/guangyey/190/orig
2025-12-04T08:57:05.8950983Z  * [new branch]              gh/guangyey/208/base        -> origin/gh/guangyey/208/base
2025-12-04T08:57:05.8952606Z  * [new branch]              gh/guangyey/208/head        -> origin/gh/guangyey/208/head
2025-12-04T08:57:05.8954174Z  * [new branch]              gh/guangyey/208/orig        -> origin/gh/guangyey/208/orig
2025-12-04T08:57:05.8956286Z  * [new branch]              gh/guangyey/228/base        -> origin/gh/guangyey/228/base
2025-12-04T08:57:05.8957865Z  * [new branch]              gh/guangyey/228/head        -> origin/gh/guangyey/228/head
2025-12-04T08:57:05.8959414Z  * [new branch]              gh/guangyey/228/orig        -> origin/gh/guangyey/228/orig
2025-12-04T08:57:05.8962107Z  * [new branch]              gh/guangyey/230/base        -> origin/gh/guangyey/230/base
2025-12-04T08:57:05.8963694Z  * [new branch]              gh/guangyey/230/head        -> origin/gh/guangyey/230/head
2025-12-04T08:57:05.8965256Z  * [new branch]              gh/guangyey/230/orig        -> origin/gh/guangyey/230/orig
2025-12-04T08:57:05.8967512Z  * [new branch]              gh/guangyey/231/base        -> origin/gh/guangyey/231/base
2025-12-04T08:57:05.8969112Z  * [new branch]              gh/guangyey/231/head        -> origin/gh/guangyey/231/head
2025-12-04T08:57:05.8970825Z  * [new branch]              gh/guangyey/231/orig        -> origin/gh/guangyey/231/orig
2025-12-04T08:57:05.8973002Z  * [new branch]              gh/guangyey/232/base        -> origin/gh/guangyey/232/base
2025-12-04T08:57:05.8974540Z  * [new branch]              gh/guangyey/232/head        -> origin/gh/guangyey/232/head
2025-12-04T08:57:05.8976125Z  * [new branch]              gh/guangyey/232/orig        -> origin/gh/guangyey/232/orig
2025-12-04T08:57:05.8978312Z  * [new branch]              gh/guangyey/233/base        -> origin/gh/guangyey/233/base
2025-12-04T08:57:05.8979906Z  * [new branch]              gh/guangyey/233/head        -> origin/gh/guangyey/233/head
2025-12-04T08:57:05.8981475Z  * [new branch]              gh/guangyey/233/orig        -> origin/gh/guangyey/233/orig
2025-12-04T08:57:05.8983737Z  * [new branch]              gh/guangyey/234/base        -> origin/gh/guangyey/234/base
2025-12-04T08:57:05.8985281Z  * [new branch]              gh/guangyey/234/head        -> origin/gh/guangyey/234/head
2025-12-04T08:57:05.8986894Z  * [new branch]              gh/guangyey/234/orig        -> origin/gh/guangyey/234/orig
2025-12-04T08:57:05.8989119Z  * [new branch]              gh/guangyey/235/base        -> origin/gh/guangyey/235/base
2025-12-04T08:57:05.8990700Z  * [new branch]              gh/guangyey/235/head        -> origin/gh/guangyey/235/head
2025-12-04T08:57:05.8992258Z  * [new branch]              gh/guangyey/235/orig        -> origin/gh/guangyey/235/orig
2025-12-04T08:57:05.8994418Z  * [new branch]              gh/guangyey/236/base        -> origin/gh/guangyey/236/base
2025-12-04T08:57:05.8996032Z  * [new branch]              gh/guangyey/236/head        -> origin/gh/guangyey/236/head
2025-12-04T08:57:05.8997558Z  * [new branch]              gh/guangyey/236/orig        -> origin/gh/guangyey/236/orig
2025-12-04T08:57:05.8999962Z  * [new branch]              gh/guangyey/237/base        -> origin/gh/guangyey/237/base
2025-12-04T08:57:05.9001543Z  * [new branch]              gh/guangyey/237/head        -> origin/gh/guangyey/237/head
2025-12-04T08:57:05.9003189Z  * [new branch]              gh/guangyey/237/orig        -> origin/gh/guangyey/237/orig
2025-12-04T08:57:05.9005292Z  * [new branch]              gh/guangyey/238/base        -> origin/gh/guangyey/238/base
2025-12-04T08:57:05.9006861Z  * [new branch]              gh/guangyey/238/head        -> origin/gh/guangyey/238/head
2025-12-04T08:57:05.9008999Z  * [new branch]              gh/guangyey/239/base        -> origin/gh/guangyey/239/base
2025-12-04T08:57:05.9010654Z  * [new branch]              gh/guangyey/239/head        -> origin/gh/guangyey/239/head
2025-12-04T08:57:05.9012243Z  * [new branch]              gh/guangyey/239/orig        -> origin/gh/guangyey/239/orig
2025-12-04T08:57:05.9014434Z  * [new branch]              gh/guangyey/240/base        -> origin/gh/guangyey/240/base
2025-12-04T08:57:05.9016013Z  * [new branch]              gh/guangyey/240/head        -> origin/gh/guangyey/240/head
2025-12-04T08:57:05.9017775Z  * [new branch]              gh/guangyey/240/orig        -> origin/gh/guangyey/240/orig
2025-12-04T08:57:05.9021471Z  * [new branch]              gh/guangyey/241/base        -> origin/gh/guangyey/241/base
2025-12-04T08:57:05.9023064Z  * [new branch]              gh/guangyey/241/head        -> origin/gh/guangyey/241/head
2025-12-04T08:57:05.9024589Z  * [new branch]              gh/guangyey/241/orig        -> origin/gh/guangyey/241/orig
2025-12-04T08:57:05.9026757Z  * [new branch]              gh/guangyey/242/base        -> origin/gh/guangyey/242/base
2025-12-04T08:57:05.9028367Z  * [new branch]              gh/guangyey/242/head        -> origin/gh/guangyey/242/head
2025-12-04T08:57:05.9029956Z  * [new branch]              gh/guangyey/242/orig        -> origin/gh/guangyey/242/orig
2025-12-04T08:57:05.9032130Z  * [new branch]              gh/guangyey/243/base        -> origin/gh/guangyey/243/base
2025-12-04T08:57:05.9033803Z  * [new branch]              gh/guangyey/243/head        -> origin/gh/guangyey/243/head
2025-12-04T08:57:05.9035369Z  * [new branch]              gh/guangyey/243/orig        -> origin/gh/guangyey/243/orig
2025-12-04T08:57:05.9037633Z  * [new branch]              gh/guangyey/244/base        -> origin/gh/guangyey/244/base
2025-12-04T08:57:05.9039220Z  * [new branch]              gh/guangyey/244/head        -> origin/gh/guangyey/244/head
2025-12-04T08:57:05.9041118Z  * [new branch]              gh/guangyey/244/orig        -> origin/gh/guangyey/244/orig
2025-12-04T08:57:05.9043315Z  * [new branch]              gh/guangyey/245/base        -> origin/gh/guangyey/245/base
2025-12-04T08:57:05.9044872Z  * [new branch]              gh/guangyey/245/head        -> origin/gh/guangyey/245/head
2025-12-04T08:57:05.9046451Z  * [new branch]              gh/guangyey/245/orig        -> origin/gh/guangyey/245/orig
2025-12-04T08:57:05.9048596Z  * [new branch]              gh/guangyey/246/base        -> origin/gh/guangyey/246/base
2025-12-04T08:57:05.9050143Z  * [new branch]              gh/guangyey/246/head        -> origin/gh/guangyey/246/head
2025-12-04T08:57:05.9051746Z  * [new branch]              gh/guangyey/246/orig        -> origin/gh/guangyey/246/orig
2025-12-04T08:57:05.9053949Z  * [new branch]              gh/guangyey/247/base        -> origin/gh/guangyey/247/base
2025-12-04T08:57:05.9055618Z  * [new branch]              gh/guangyey/247/head        -> origin/gh/guangyey/247/head
2025-12-04T08:57:05.9057191Z  * [new branch]              gh/guangyey/247/orig        -> origin/gh/guangyey/247/orig
2025-12-04T08:57:05.9059496Z  * [new branch]              gh/guangyey/248/base        -> origin/gh/guangyey/248/base
2025-12-04T08:57:05.9061061Z  * [new branch]              gh/guangyey/248/head        -> origin/gh/guangyey/248/head
2025-12-04T08:57:05.9062638Z  * [new branch]              gh/guangyey/248/orig        -> origin/gh/guangyey/248/orig
2025-12-04T08:57:05.9065019Z  * [new branch]              gh/guangyey/249/base        -> origin/gh/guangyey/249/base
2025-12-04T08:57:05.9066383Z  * [new branch]              gh/guangyey/249/head        -> origin/gh/guangyey/249/head
2025-12-04T08:57:05.9068080Z  * [new branch]              gh/guangyey/249/orig        -> origin/gh/guangyey/249/orig
2025-12-04T08:57:05.9070181Z  * [new branch]              gh/guangyey/250/base        -> origin/gh/guangyey/250/base
2025-12-04T08:57:05.9071796Z  * [new branch]              gh/guangyey/250/head        -> origin/gh/guangyey/250/head
2025-12-04T08:57:05.9073382Z  * [new branch]              gh/guangyey/250/orig        -> origin/gh/guangyey/250/orig
2025-12-04T08:57:05.9075464Z  * [new branch]              gh/guangyey/251/base        -> origin/gh/guangyey/251/base
2025-12-04T08:57:05.9077166Z  * [new branch]              gh/guangyey/251/head        -> origin/gh/guangyey/251/head
2025-12-04T08:57:05.9078777Z  * [new branch]              gh/guangyey/251/orig        -> origin/gh/guangyey/251/orig
2025-12-04T08:57:05.9081187Z  * [new branch]              gh/guangyey/252/base        -> origin/gh/guangyey/252/base
2025-12-04T08:57:05.9082768Z  * [new branch]              gh/guangyey/252/head        -> origin/gh/guangyey/252/head
2025-12-04T08:57:05.9084293Z  * [new branch]              gh/guangyey/252/orig        -> origin/gh/guangyey/252/orig
2025-12-04T08:57:05.9086405Z  * [new branch]              gh/guangyey/253/base        -> origin/gh/guangyey/253/base
2025-12-04T08:57:05.9088056Z  * [new branch]              gh/guangyey/253/head        -> origin/gh/guangyey/253/head
2025-12-04T08:57:05.9089455Z  * [new branch]              gh/guangyey/253/orig        -> origin/gh/guangyey/253/orig
2025-12-04T08:57:05.9091635Z  * [new branch]              gh/guangyey/254/base        -> origin/gh/guangyey/254/base
2025-12-04T08:57:05.9093237Z  * [new branch]              gh/guangyey/254/head        -> origin/gh/guangyey/254/head
2025-12-04T08:57:05.9094767Z  * [new branch]              gh/guangyey/254/orig        -> origin/gh/guangyey/254/orig
2025-12-04T08:57:05.9096969Z  * [new branch]              gh/guangyey/255/base        -> origin/gh/guangyey/255/base
2025-12-04T08:57:05.9098685Z  * [new branch]              gh/guangyey/255/head        -> origin/gh/guangyey/255/head
2025-12-04T08:57:05.9100209Z  * [new branch]              gh/guangyey/255/orig        -> origin/gh/guangyey/255/orig
2025-12-04T08:57:05.9102911Z  * [new branch]              gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base
2025-12-04T08:57:05.9104512Z  * [new branch]              gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head
2025-12-04T08:57:05.9106112Z  * [new branch]              gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig
2025-12-04T08:57:05.9108501Z  * [new branch]              gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base
2025-12-04T08:57:05.9110609Z  * [new branch]              gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head
2025-12-04T08:57:05.9112969Z  * [new branch]              gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig
2025-12-04T08:57:05.9115846Z  * [new branch]              gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base
2025-12-04T08:57:05.9118307Z  * [new branch]              gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head
2025-12-04T08:57:05.9122191Z  * [new branch]              gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig
2025-12-04T08:57:05.9125099Z  * [new branch]              gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base
2025-12-04T08:57:05.9127851Z  * [new branch]              gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head
2025-12-04T08:57:05.9128572Z  * [new branch]              gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig
2025-12-04T08:57:05.9130845Z  * [new branch]              gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base
2025-12-04T08:57:05.9132793Z  * [new branch]              gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head
2025-12-04T08:57:05.9133910Z  * [new branch]              gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig
2025-12-04T08:57:05.9136143Z  * [new branch]              gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base
2025-12-04T08:57:05.9137742Z  * [new branch]              gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head
2025-12-04T08:57:05.9139342Z  * [new branch]              gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig
2025-12-04T08:57:05.9141501Z  * [new branch]              gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base
2025-12-04T08:57:05.9143045Z  * [new branch]              gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head
2025-12-04T08:57:05.9144648Z  * [new branch]              gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig
2025-12-04T08:57:05.9146895Z  * [new branch]              gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base
2025-12-04T08:57:05.9148883Z  * [new branch]              gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head
2025-12-04T08:57:05.9150450Z  * [new branch]              gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig
2025-12-04T08:57:05.9152561Z  * [new branch]              gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base
2025-12-04T08:57:05.9154169Z  * [new branch]              gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head
2025-12-04T08:57:05.9155826Z  * [new branch]              gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig
2025-12-04T08:57:05.9157947Z  * [new branch]              gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base
2025-12-04T08:57:05.9159688Z  * [new branch]              gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head
2025-12-04T08:57:05.9161402Z  * [new branch]              gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig
2025-12-04T08:57:05.9163545Z  * [new branch]              gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base
2025-12-04T08:57:05.9165263Z  * [new branch]              gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head
2025-12-04T08:57:05.9166878Z  * [new branch]              gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig
2025-12-04T08:57:05.9169073Z  * [new branch]              gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base
2025-12-04T08:57:05.9170604Z  * [new branch]              gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head
2025-12-04T08:57:05.9172187Z  * [new branch]              gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig
2025-12-04T08:57:05.9174309Z  * [new branch]              gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base
2025-12-04T08:57:05.9175904Z  * [new branch]              gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head
2025-12-04T08:57:05.9177523Z  * [new branch]              gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig
2025-12-04T08:57:05.9179692Z  * [new branch]              gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base
2025-12-04T08:57:05.9181294Z  * [new branch]              gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head
2025-12-04T08:57:05.9182896Z  * [new branch]              gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig
2025-12-04T08:57:05.9184998Z  * [new branch]              gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base
2025-12-04T08:57:05.9186617Z  * [new branch]              gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head
2025-12-04T08:57:05.9188237Z  * [new branch]              gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig
2025-12-04T08:57:05.9192047Z  * [new branch]              gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base
2025-12-04T08:57:05.9192915Z  * [new branch]              gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head
2025-12-04T08:57:05.9193914Z  * [new branch]              gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig
2025-12-04T08:57:05.9196726Z  * [new branch]              gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base
2025-12-04T08:57:05.9198315Z  * [new branch]              gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head
2025-12-04T08:57:05.9200013Z  * [new branch]              gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig
2025-12-04T08:57:05.9202371Z  * [new branch]              gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base
2025-12-04T08:57:05.9204371Z  * [new branch]              gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head
2025-12-04T08:57:05.9205909Z  * [new branch]              gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig
2025-12-04T08:57:05.9208107Z  * [new branch]              gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base
2025-12-04T08:57:05.9209841Z  * [new branch]              gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head
2025-12-04T08:57:05.9211429Z  * [new branch]              gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig
2025-12-04T08:57:05.9213681Z  * [new branch]              gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base
2025-12-04T08:57:05.9215426Z  * [new branch]              gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head
2025-12-04T08:57:05.9216920Z  * [new branch]              gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig
2025-12-04T08:57:05.9219282Z  * [new branch]              gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base
2025-12-04T08:57:05.9221175Z  * [new branch]              gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head
2025-12-04T08:57:05.9222634Z  * [new branch]              gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig
2025-12-04T08:57:05.9224796Z  * [new branch]              gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base
2025-12-04T08:57:05.9226388Z  * [new branch]              gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head
2025-12-04T08:57:05.9228068Z  * [new branch]              gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig
2025-12-04T08:57:05.9230144Z  * [new branch]              gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base
2025-12-04T08:57:05.9231706Z  * [new branch]              gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head
2025-12-04T08:57:05.9233316Z  * [new branch]              gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig
2025-12-04T08:57:05.9235587Z  * [new branch]              gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base
2025-12-04T08:57:05.9237206Z  * [new branch]              gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head
2025-12-04T08:57:05.9238840Z  * [new branch]              gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig
2025-12-04T08:57:05.9241278Z  * [new branch]              gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base
2025-12-04T08:57:05.9242771Z  * [new branch]              gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head
2025-12-04T08:57:05.9244320Z  * [new branch]              gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig
2025-12-04T08:57:05.9246510Z  * [new branch]              gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base
2025-12-04T08:57:05.9248187Z  * [new branch]              gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head
2025-12-04T08:57:05.9249680Z  * [new branch]              gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig
2025-12-04T08:57:05.9251861Z  * [new branch]              gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base
2025-12-04T08:57:05.9253606Z  * [new branch]              gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head
2025-12-04T08:57:05.9255389Z  * [new branch]              gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig
2025-12-04T08:57:05.9257343Z  * [new branch]              gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base
2025-12-04T08:57:05.9258898Z  * [new branch]              gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head
2025-12-04T08:57:05.9260550Z  * [new branch]              gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig
2025-12-04T08:57:05.9262773Z  * [new branch]              gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base
2025-12-04T08:57:05.9264373Z  * [new branch]              gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head
2025-12-04T08:57:05.9265947Z  * [new branch]              gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig
2025-12-04T08:57:05.9268209Z  * [new branch]              gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base
2025-12-04T08:57:05.9269776Z  * [new branch]              gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head
2025-12-04T08:57:05.9271355Z  * [new branch]              gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig
2025-12-04T08:57:05.9273581Z  * [new branch]              gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base
2025-12-04T08:57:05.9275190Z  * [new branch]              gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head
2025-12-04T08:57:05.9276803Z  * [new branch]              gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig
2025-12-04T08:57:05.9279432Z  * [new branch]              gh/hameerabbasi/1/base      -> origin/gh/hameerabbasi/1/base
2025-12-04T08:57:05.9281175Z  * [new branch]              gh/hameerabbasi/1/head      -> origin/gh/hameerabbasi/1/head
2025-12-04T08:57:05.9283230Z  * [new branch]              gh/hameerabbasi/2/base      -> origin/gh/hameerabbasi/2/base
2025-12-04T08:57:05.9284826Z  * [new branch]              gh/hameerabbasi/2/head      -> origin/gh/hameerabbasi/2/head
2025-12-04T08:57:05.9286407Z  * [new branch]              gh/hameerabbasi/2/orig      -> origin/gh/hameerabbasi/2/orig
2025-12-04T08:57:05.9288439Z  * [new branch]              gh/hameerabbasi/3/base      -> origin/gh/hameerabbasi/3/base
2025-12-04T08:57:05.9290116Z  * [new branch]              gh/hameerabbasi/3/head      -> origin/gh/hameerabbasi/3/head
2025-12-04T08:57:05.9291784Z  * [new branch]              gh/hameerabbasi/3/orig      -> origin/gh/hameerabbasi/3/orig
2025-12-04T08:57:05.9293796Z  * [new branch]              gh/hameerabbasi/4/base      -> origin/gh/hameerabbasi/4/base
2025-12-04T08:57:05.9295381Z  * [new branch]              gh/hameerabbasi/4/head      -> origin/gh/hameerabbasi/4/head
2025-12-04T08:57:05.9296948Z  * [new branch]              gh/hameerabbasi/4/orig      -> origin/gh/hameerabbasi/4/orig
2025-12-04T08:57:05.9299572Z  * [new branch]              gh/huydhn/1/next            -> origin/gh/huydhn/1/next
2025-12-04T08:57:05.9301565Z  * [new branch]              gh/huydhn/2/next            -> origin/gh/huydhn/2/next
2025-12-04T08:57:05.9303664Z  * [new branch]              gh/huydhn/3/next            -> origin/gh/huydhn/3/next
2025-12-04T08:57:05.9305824Z  * [new branch]              gh/huydhn/4/next            -> origin/gh/huydhn/4/next
2025-12-04T08:57:05.9307926Z  * [new branch]              gh/huydhn/5/next            -> origin/gh/huydhn/5/next
2025-12-04T08:57:05.9309986Z  * [new branch]              gh/huydhn/6/next            -> origin/gh/huydhn/6/next
2025-12-04T08:57:05.9312628Z  * [new branch]              gh/int3/97/base             -> origin/gh/int3/97/base
2025-12-04T08:57:05.9314220Z  * [new branch]              gh/int3/97/head             -> origin/gh/int3/97/head
2025-12-04T08:57:05.9316897Z  * [new branch]              gh/isuruf/101/base          -> origin/gh/isuruf/101/base
2025-12-04T08:57:05.9318874Z  * [new branch]              gh/isuruf/101/head          -> origin/gh/isuruf/101/head
2025-12-04T08:57:05.9320959Z  * [new branch]              gh/isuruf/146/base          -> origin/gh/isuruf/146/base
2025-12-04T08:57:05.9322486Z  * [new branch]              gh/isuruf/146/head          -> origin/gh/isuruf/146/head
2025-12-04T08:57:05.9324073Z  * [new branch]              gh/isuruf/146/orig          -> origin/gh/isuruf/146/orig
2025-12-04T08:57:05.9326208Z  * [new branch]              gh/isuruf/158/base          -> origin/gh/isuruf/158/base
2025-12-04T08:57:05.9327851Z  * [new branch]              gh/isuruf/158/head          -> origin/gh/isuruf/158/head
2025-12-04T08:57:05.9329870Z  * [new branch]              gh/isuruf/159/base          -> origin/gh/isuruf/159/base
2025-12-04T08:57:05.9331381Z  * [new branch]              gh/isuruf/159/head          -> origin/gh/isuruf/159/head
2025-12-04T08:57:05.9333528Z  * [new branch]              gh/isuruf/160/base          -> origin/gh/isuruf/160/base
2025-12-04T08:57:05.9335719Z  * [new branch]              gh/isuruf/160/head          -> origin/gh/isuruf/160/head
2025-12-04T08:57:05.9337313Z  * [new branch]              gh/isuruf/160/orig          -> origin/gh/isuruf/160/orig
2025-12-04T08:57:05.9339508Z  * [new branch]              gh/isuruf/81/base           -> origin/gh/isuruf/81/base
2025-12-04T08:57:05.9341142Z  * [new branch]              gh/isuruf/81/head           -> origin/gh/isuruf/81/head
2025-12-04T08:57:05.9342727Z  * [new branch]              gh/isuruf/81/orig           -> origin/gh/isuruf/81/orig
2025-12-04T08:57:05.9345234Z  * [new branch]              gh/jamesjwu/176/base        -> origin/gh/jamesjwu/176/base
2025-12-04T08:57:05.9346825Z  * [new branch]              gh/jamesjwu/176/head        -> origin/gh/jamesjwu/176/head
2025-12-04T08:57:05.9348442Z  * [new branch]              gh/jamesjwu/176/orig        -> origin/gh/jamesjwu/176/orig
2025-12-04T08:57:05.9350519Z  * [new branch]              gh/jamesjwu/187/base        -> origin/gh/jamesjwu/187/base
2025-12-04T08:57:05.9352234Z  * [new branch]              gh/jamesjwu/187/head        -> origin/gh/jamesjwu/187/head
2025-12-04T08:57:05.9353741Z  * [new branch]              gh/jamesjwu/187/orig        -> origin/gh/jamesjwu/187/orig
2025-12-04T08:57:05.9356121Z  * [new branch]              gh/jamesjwu/196/base        -> origin/gh/jamesjwu/196/base
2025-12-04T08:57:05.9357690Z  * [new branch]              gh/jamesjwu/196/head        -> origin/gh/jamesjwu/196/head
2025-12-04T08:57:05.9359225Z  * [new branch]              gh/jamesjwu/196/orig        -> origin/gh/jamesjwu/196/orig
2025-12-04T08:57:05.9361475Z  * [new branch]              gh/jamesjwu/198/base        -> origin/gh/jamesjwu/198/base
2025-12-04T08:57:05.9363025Z  * [new branch]              gh/jamesjwu/198/head        -> origin/gh/jamesjwu/198/head
2025-12-04T08:57:05.9364583Z  * [new branch]              gh/jamesjwu/198/orig        -> origin/gh/jamesjwu/198/orig
2025-12-04T08:57:05.9366708Z  * [new branch]              gh/jamesjwu/207/base        -> origin/gh/jamesjwu/207/base
2025-12-04T08:57:05.9368988Z  * [new branch]              gh/jamesjwu/207/head        -> origin/gh/jamesjwu/207/head
2025-12-04T08:57:05.9370587Z  * [new branch]              gh/jamesjwu/207/orig        -> origin/gh/jamesjwu/207/orig
2025-12-04T08:57:05.9372777Z  * [new branch]              gh/jamesjwu/208/base        -> origin/gh/jamesjwu/208/base
2025-12-04T08:57:05.9374356Z  * [new branch]              gh/jamesjwu/208/head        -> origin/gh/jamesjwu/208/head
2025-12-04T08:57:05.9375998Z  * [new branch]              gh/jamesjwu/208/orig        -> origin/gh/jamesjwu/208/orig
2025-12-04T08:57:05.9378432Z  * [new branch]              gh/jamesjwu/52/base         -> origin/gh/jamesjwu/52/base
2025-12-04T08:57:05.9380162Z  * [new branch]              gh/jamesjwu/52/head         -> origin/gh/jamesjwu/52/head
2025-12-04T08:57:05.9382165Z  * [new branch]              gh/jamesjwu/53/base         -> origin/gh/jamesjwu/53/base
2025-12-04T08:57:05.9383692Z  * [new branch]              gh/jamesjwu/53/head         -> origin/gh/jamesjwu/53/head
2025-12-04T08:57:05.9385790Z  * [new branch]              gh/jamesjwu/54/base         -> origin/gh/jamesjwu/54/base
2025-12-04T08:57:05.9387284Z  * [new branch]              gh/jamesjwu/54/head         -> origin/gh/jamesjwu/54/head
2025-12-04T08:57:05.9389233Z  * [new branch]              gh/jamesjwu/55/base         -> origin/gh/jamesjwu/55/base
2025-12-04T08:57:05.9390872Z  * [new branch]              gh/jamesjwu/55/head         -> origin/gh/jamesjwu/55/head
2025-12-04T08:57:05.9392896Z  * [new branch]              gh/jamesjwu/56/base         -> origin/gh/jamesjwu/56/base
2025-12-04T08:57:05.9394446Z  * [new branch]              gh/jamesjwu/56/head         -> origin/gh/jamesjwu/56/head
2025-12-04T08:57:05.9396464Z  * [new branch]              gh/jamesjwu/57/base         -> origin/gh/jamesjwu/57/base
2025-12-04T08:57:05.9398007Z  * [new branch]              gh/jamesjwu/57/head         -> origin/gh/jamesjwu/57/head
2025-12-04T08:57:05.9400018Z  * [new branch]              gh/jamesjwu/58/base         -> origin/gh/jamesjwu/58/base
2025-12-04T08:57:05.9401646Z  * [new branch]              gh/jamesjwu/58/head         -> origin/gh/jamesjwu/58/head
2025-12-04T08:57:05.9403700Z  * [new branch]              gh/jamesjwu/59/base         -> origin/gh/jamesjwu/59/base
2025-12-04T08:57:05.9405217Z  * [new branch]              gh/jamesjwu/59/head         -> origin/gh/jamesjwu/59/head
2025-12-04T08:57:05.9407230Z  * [new branch]              gh/jamesjwu/60/base         -> origin/gh/jamesjwu/60/base
2025-12-04T08:57:05.9409323Z  * [new branch]              gh/jamesjwu/60/head         -> origin/gh/jamesjwu/60/head
2025-12-04T08:57:05.9411412Z  * [new branch]              gh/jamesjwu/61/base         -> origin/gh/jamesjwu/61/base
2025-12-04T08:57:05.9412955Z  * [new branch]              gh/jamesjwu/61/head         -> origin/gh/jamesjwu/61/head
2025-12-04T08:57:05.9414938Z  * [new branch]              gh/jamesjwu/62/base         -> origin/gh/jamesjwu/62/base
2025-12-04T08:57:05.9416501Z  * [new branch]              gh/jamesjwu/62/head         -> origin/gh/jamesjwu/62/head
2025-12-04T08:57:05.9419909Z  * [new branch]              gh/jamesjwu/63/base         -> origin/gh/jamesjwu/63/base
2025-12-04T08:57:05.9421442Z  * [new branch]              gh/jamesjwu/63/head         -> origin/gh/jamesjwu/63/head
2025-12-04T08:57:05.9424090Z  * [new branch]              gh/jamesjwu/64/base         -> origin/gh/jamesjwu/64/base
2025-12-04T08:57:05.9425723Z  * [new branch]              gh/jamesjwu/64/head         -> origin/gh/jamesjwu/64/head
2025-12-04T08:57:05.9427766Z  * [new branch]              gh/jamesjwu/65/base         -> origin/gh/jamesjwu/65/base
2025-12-04T08:57:05.9429298Z  * [new branch]              gh/jamesjwu/65/head         -> origin/gh/jamesjwu/65/head
2025-12-04T08:57:05.9432101Z  * [new branch]              gh/janeyx99/165/base        -> origin/gh/janeyx99/165/base
2025-12-04T08:57:05.9433731Z  * [new branch]              gh/janeyx99/165/head        -> origin/gh/janeyx99/165/head
2025-12-04T08:57:05.9435294Z  * [new branch]              gh/janeyx99/165/orig        -> origin/gh/janeyx99/165/orig
2025-12-04T08:57:05.9437256Z  * [new branch]              gh/janeyx99/201/base        -> origin/gh/janeyx99/201/base
2025-12-04T08:57:05.9438895Z  * [new branch]              gh/janeyx99/201/head        -> origin/gh/janeyx99/201/head
2025-12-04T08:57:05.9440639Z  * [new branch]              gh/janeyx99/201/orig        -> origin/gh/janeyx99/201/orig
2025-12-04T08:57:05.9443297Z  * [new branch]              gh/janeyx99/225/base        -> origin/gh/janeyx99/225/base
2025-12-04T08:57:05.9444864Z  * [new branch]              gh/janeyx99/225/head        -> origin/gh/janeyx99/225/head
2025-12-04T08:57:05.9446458Z  * [new branch]              gh/janeyx99/225/orig        -> origin/gh/janeyx99/225/orig
2025-12-04T08:57:05.9448581Z  * [new branch]              gh/janeyx99/299/base        -> origin/gh/janeyx99/299/base
2025-12-04T08:57:05.9450102Z  * [new branch]              gh/janeyx99/299/head        -> origin/gh/janeyx99/299/head
2025-12-04T08:57:05.9451994Z  * [new branch]              gh/janeyx99/299/orig        -> origin/gh/janeyx99/299/orig
2025-12-04T08:57:05.9454239Z  * [new branch]              gh/janeyx99/302/base        -> origin/gh/janeyx99/302/base
2025-12-04T08:57:05.9455785Z  * [new branch]              gh/janeyx99/302/head        -> origin/gh/janeyx99/302/head
2025-12-04T08:57:05.9457756Z  * [new branch]              gh/janeyx99/303/base        -> origin/gh/janeyx99/303/base
2025-12-04T08:57:05.9459313Z  * [new branch]              gh/janeyx99/303/head        -> origin/gh/janeyx99/303/head
2025-12-04T08:57:05.9461574Z  * [new branch]              gh/janeyx99/305/base        -> origin/gh/janeyx99/305/base
2025-12-04T08:57:05.9463118Z  * [new branch]              gh/janeyx99/305/head        -> origin/gh/janeyx99/305/head
2025-12-04T08:57:05.9465099Z  * [new branch]              gh/janeyx99/306/base        -> origin/gh/janeyx99/306/base
2025-12-04T08:57:05.9466707Z  * [new branch]              gh/janeyx99/306/head        -> origin/gh/janeyx99/306/head
2025-12-04T08:57:05.9468868Z  * [new branch]              gh/janeyx99/314/base        -> origin/gh/janeyx99/314/base
2025-12-04T08:57:05.9470497Z  * [new branch]              gh/janeyx99/314/head        -> origin/gh/janeyx99/314/head
2025-12-04T08:57:05.9472127Z  * [new branch]              gh/janeyx99/314/orig        -> origin/gh/janeyx99/314/orig
2025-12-04T08:57:05.9474313Z  * [new branch]              gh/janeyx99/315/base        -> origin/gh/janeyx99/315/base
2025-12-04T08:57:05.9475939Z  * [new branch]              gh/janeyx99/315/head        -> origin/gh/janeyx99/315/head
2025-12-04T08:57:05.9477473Z  * [new branch]              gh/janeyx99/315/orig        -> origin/gh/janeyx99/315/orig
2025-12-04T08:57:05.9479699Z  * [new branch]              gh/janeyx99/316/base        -> origin/gh/janeyx99/316/base
2025-12-04T08:57:05.9481554Z  * [new branch]              gh/janeyx99/316/head        -> origin/gh/janeyx99/316/head
2025-12-04T08:57:05.9483107Z  * [new branch]              gh/janeyx99/316/orig        -> origin/gh/janeyx99/316/orig
2025-12-04T08:57:05.9485330Z  * [new branch]              gh/janeyx99/317/base        -> origin/gh/janeyx99/317/base
2025-12-04T08:57:05.9486899Z  * [new branch]              gh/janeyx99/317/head        -> origin/gh/janeyx99/317/head
2025-12-04T08:57:05.9488499Z  * [new branch]              gh/janeyx99/317/orig        -> origin/gh/janeyx99/317/orig
2025-12-04T08:57:05.9490597Z  * [new branch]              gh/janeyx99/325/base        -> origin/gh/janeyx99/325/base
2025-12-04T08:57:05.9492192Z  * [new branch]              gh/janeyx99/325/head        -> origin/gh/janeyx99/325/head
2025-12-04T08:57:05.9493780Z  * [new branch]              gh/janeyx99/325/orig        -> origin/gh/janeyx99/325/orig
2025-12-04T08:57:05.9495915Z  * [new branch]              gh/janeyx99/327/base        -> origin/gh/janeyx99/327/base
2025-12-04T08:57:05.9497497Z  * [new branch]              gh/janeyx99/327/head        -> origin/gh/janeyx99/327/head
2025-12-04T08:57:05.9499573Z  * [new branch]              gh/janeyx99/327/orig        -> origin/gh/janeyx99/327/orig
2025-12-04T08:57:05.9501701Z  * [new branch]              gh/janeyx99/328/base        -> origin/gh/janeyx99/328/base
2025-12-04T08:57:05.9503438Z  * [new branch]              gh/janeyx99/328/head        -> origin/gh/janeyx99/328/head
2025-12-04T08:57:05.9505003Z  * [new branch]              gh/janeyx99/328/orig        -> origin/gh/janeyx99/328/orig
2025-12-04T08:57:05.9507037Z  * [new branch]              gh/janeyx99/329/base        -> origin/gh/janeyx99/329/base
2025-12-04T08:57:05.9508772Z  * [new branch]              gh/janeyx99/329/head        -> origin/gh/janeyx99/329/head
2025-12-04T08:57:05.9510425Z  * [new branch]              gh/janeyx99/329/orig        -> origin/gh/janeyx99/329/orig
2025-12-04T08:57:05.9513007Z  * [new branch]              gh/janeyx99/330/base        -> origin/gh/janeyx99/330/base
2025-12-04T08:57:05.9514619Z  * [new branch]              gh/janeyx99/330/head        -> origin/gh/janeyx99/330/head
2025-12-04T08:57:05.9516316Z  * [new branch]              gh/janeyx99/330/orig        -> origin/gh/janeyx99/330/orig
2025-12-04T08:57:05.9518804Z  * [new branch]              gh/janeyx99/331/base        -> origin/gh/janeyx99/331/base
2025-12-04T08:57:05.9520211Z  * [new branch]              gh/janeyx99/331/head        -> origin/gh/janeyx99/331/head
2025-12-04T08:57:05.9521859Z  * [new branch]              gh/janeyx99/331/orig        -> origin/gh/janeyx99/331/orig
2025-12-04T08:57:05.9523988Z  * [new branch]              gh/janeyx99/332/base        -> origin/gh/janeyx99/332/base
2025-12-04T08:57:05.9525628Z  * [new branch]              gh/janeyx99/332/head        -> origin/gh/janeyx99/332/head
2025-12-04T08:57:05.9527236Z  * [new branch]              gh/janeyx99/332/orig        -> origin/gh/janeyx99/332/orig
2025-12-04T08:57:05.9529336Z  * [new branch]              gh/janeyx99/333/base        -> origin/gh/janeyx99/333/base
2025-12-04T08:57:05.9531031Z  * [new branch]              gh/janeyx99/333/head        -> origin/gh/janeyx99/333/head
2025-12-04T08:57:05.9532584Z  * [new branch]              gh/janeyx99/333/orig        -> origin/gh/janeyx99/333/orig
2025-12-04T08:57:05.9534856Z  * [new branch]              gh/janeyx99/88/base         -> origin/gh/janeyx99/88/base
2025-12-04T08:57:05.9536389Z  * [new branch]              gh/janeyx99/88/head         -> origin/gh/janeyx99/88/head
2025-12-04T08:57:05.9537935Z  * [new branch]              gh/janeyx99/88/orig         -> origin/gh/janeyx99/88/orig
2025-12-04T08:57:05.9540457Z  * [new branch]              gh/jansel/360/base          -> origin/gh/jansel/360/base
2025-12-04T08:57:05.9542080Z  * [new branch]              gh/jansel/360/head          -> origin/gh/jansel/360/head
2025-12-04T08:57:05.9544183Z  * [new branch]              gh/jansel/451/base          -> origin/gh/jansel/451/base
2025-12-04T08:57:05.9545838Z  * [new branch]              gh/jansel/451/head          -> origin/gh/jansel/451/head
2025-12-04T08:57:05.9547626Z  * [new branch]              gh/jansel/451/orig          -> origin/gh/jansel/451/orig
2025-12-04T08:57:05.9549842Z  * [new branch]              gh/jansel/462/base          -> origin/gh/jansel/462/base
2025-12-04T08:57:05.9551393Z  * [new branch]              gh/jansel/462/head          -> origin/gh/jansel/462/head
2025-12-04T08:57:05.9552964Z  * [new branch]              gh/jansel/462/orig          -> origin/gh/jansel/462/orig
2025-12-04T08:57:05.9555094Z  * [new branch]              gh/jansel/533/base          -> origin/gh/jansel/533/base
2025-12-04T08:57:05.9556660Z  * [new branch]              gh/jansel/533/head          -> origin/gh/jansel/533/head
2025-12-04T08:57:05.9558218Z  * [new branch]              gh/jansel/533/orig          -> origin/gh/jansel/533/orig
2025-12-04T08:57:05.9560285Z  * [new branch]              gh/jansel/552/base          -> origin/gh/jansel/552/base
2025-12-04T08:57:05.9561990Z  * [new branch]              gh/jansel/552/head          -> origin/gh/jansel/552/head
2025-12-04T08:57:05.9563494Z  * [new branch]              gh/jansel/552/orig          -> origin/gh/jansel/552/orig
2025-12-04T08:57:05.9565654Z  * [new branch]              gh/jansel/553/base          -> origin/gh/jansel/553/base
2025-12-04T08:57:05.9567318Z  * [new branch]              gh/jansel/553/head          -> origin/gh/jansel/553/head
2025-12-04T08:57:05.9568902Z  * [new branch]              gh/jansel/553/orig          -> origin/gh/jansel/553/orig
2025-12-04T08:57:05.9571161Z  * [new branch]              gh/jansel/554/base          -> origin/gh/jansel/554/base
2025-12-04T08:57:05.9572666Z  * [new branch]              gh/jansel/554/head          -> origin/gh/jansel/554/head
2025-12-04T08:57:05.9574237Z  * [new branch]              gh/jansel/554/orig          -> origin/gh/jansel/554/orig
2025-12-04T08:57:05.9576322Z  * [new branch]              gh/jansel/555/base          -> origin/gh/jansel/555/base
2025-12-04T08:57:05.9577892Z  * [new branch]              gh/jansel/555/head          -> origin/gh/jansel/555/head
2025-12-04T08:57:05.9579536Z  * [new branch]              gh/jansel/555/orig          -> origin/gh/jansel/555/orig
2025-12-04T08:57:05.9581756Z  * [new branch]              gh/jansel/556/base          -> origin/gh/jansel/556/base
2025-12-04T08:57:05.9583300Z  * [new branch]              gh/jansel/556/head          -> origin/gh/jansel/556/head
2025-12-04T08:57:05.9584901Z  * [new branch]              gh/jansel/556/orig          -> origin/gh/jansel/556/orig
2025-12-04T08:57:05.9587170Z  * [new branch]              gh/jansel/557/base          -> origin/gh/jansel/557/base
2025-12-04T08:57:05.9589293Z  * [new branch]              gh/jansel/557/head          -> origin/gh/jansel/557/head
2025-12-04T08:57:05.9591686Z  * [new branch]              gh/jansel/557/orig          -> origin/gh/jansel/557/orig
2025-12-04T08:57:05.9594737Z  * [new branch]              gh/jansel/558/base          -> origin/gh/jansel/558/base
2025-12-04T08:57:05.9596963Z  * [new branch]              gh/jansel/558/head          -> origin/gh/jansel/558/head
2025-12-04T08:57:05.9599160Z  * [new branch]              gh/jansel/558/orig          -> origin/gh/jansel/558/orig
2025-12-04T08:57:05.9602317Z  * [new branch]              gh/jansel/559/base          -> origin/gh/jansel/559/base
2025-12-04T08:57:05.9604440Z  * [new branch]              gh/jansel/559/head          -> origin/gh/jansel/559/head
2025-12-04T08:57:05.9606847Z  * [new branch]              gh/jansel/559/orig          -> origin/gh/jansel/559/orig
2025-12-04T08:57:05.9609449Z  * [new branch]              gh/jansel/560/base          -> origin/gh/jansel/560/base
2025-12-04T08:57:05.9612126Z  * [new branch]              gh/jansel/560/head          -> origin/gh/jansel/560/head
2025-12-04T08:57:05.9614320Z  * [new branch]              gh/jansel/560/orig          -> origin/gh/jansel/560/orig
2025-12-04T08:57:05.9617360Z  * [new branch]              gh/jansel/561/base          -> origin/gh/jansel/561/base
2025-12-04T08:57:05.9618997Z  * [new branch]              gh/jansel/561/head          -> origin/gh/jansel/561/head
2025-12-04T08:57:05.9620651Z  * [new branch]              gh/jansel/561/orig          -> origin/gh/jansel/561/orig
2025-12-04T08:57:05.9623316Z  * [new branch]              gh/jansel/562/base          -> origin/gh/jansel/562/base
2025-12-04T08:57:05.9625278Z  * [new branch]              gh/jansel/562/head          -> origin/gh/jansel/562/head
2025-12-04T08:57:05.9627305Z  * [new branch]              gh/jansel/562/orig          -> origin/gh/jansel/562/orig
2025-12-04T08:57:05.9629514Z  * [new branch]              gh/jansel/563/base          -> origin/gh/jansel/563/base
2025-12-04T08:57:05.9631183Z  * [new branch]              gh/jansel/563/head          -> origin/gh/jansel/563/head
2025-12-04T08:57:05.9632770Z  * [new branch]              gh/jansel/563/orig          -> origin/gh/jansel/563/orig
2025-12-04T08:57:05.9635528Z  * [new branch]              gh/jansel/564/base          -> origin/gh/jansel/564/base
2025-12-04T08:57:05.9637128Z  * [new branch]              gh/jansel/564/head          -> origin/gh/jansel/564/head
2025-12-04T08:57:05.9638746Z  * [new branch]              gh/jansel/564/orig          -> origin/gh/jansel/564/orig
2025-12-04T08:57:05.9641164Z  * [new branch]              gh/jansel/565/base          -> origin/gh/jansel/565/base
2025-12-04T08:57:05.9642794Z  * [new branch]              gh/jansel/565/head          -> origin/gh/jansel/565/head
2025-12-04T08:57:05.9644545Z  * [new branch]              gh/jansel/565/orig          -> origin/gh/jansel/565/orig
2025-12-04T08:57:05.9646774Z  * [new branch]              gh/jansel/566/base          -> origin/gh/jansel/566/base
2025-12-04T08:57:05.9648411Z  * [new branch]              gh/jansel/566/head          -> origin/gh/jansel/566/head
2025-12-04T08:57:05.9649965Z  * [new branch]              gh/jansel/566/orig          -> origin/gh/jansel/566/orig
2025-12-04T08:57:05.9652135Z  * [new branch]              gh/jansel/567/base          -> origin/gh/jansel/567/base
2025-12-04T08:57:05.9654125Z  * [new branch]              gh/jansel/567/head          -> origin/gh/jansel/567/head
2025-12-04T08:57:05.9656051Z  * [new branch]              gh/jansel/567/orig          -> origin/gh/jansel/567/orig
2025-12-04T08:57:05.9658243Z  * [new branch]              gh/jansel/568/base          -> origin/gh/jansel/568/base
2025-12-04T08:57:05.9659851Z  * [new branch]              gh/jansel/568/head          -> origin/gh/jansel/568/head
2025-12-04T08:57:05.9661528Z  * [new branch]              gh/jansel/568/orig          -> origin/gh/jansel/568/orig
2025-12-04T08:57:05.9663704Z  * [new branch]              gh/jansel/569/base          -> origin/gh/jansel/569/base
2025-12-04T08:57:05.9665353Z  * [new branch]              gh/jansel/569/head          -> origin/gh/jansel/569/head
2025-12-04T08:57:05.9666906Z  * [new branch]              gh/jansel/569/orig          -> origin/gh/jansel/569/orig
2025-12-04T08:57:05.9669041Z  * [new branch]              gh/jansel/570/base          -> origin/gh/jansel/570/base
2025-12-04T08:57:05.9670585Z  * [new branch]              gh/jansel/570/head          -> origin/gh/jansel/570/head
2025-12-04T08:57:05.9672692Z  * [new branch]              gh/jansel/570/orig          -> origin/gh/jansel/570/orig
2025-12-04T08:57:05.9675299Z  * [new branch]              gh/jansel/571/base          -> origin/gh/jansel/571/base
2025-12-04T08:57:05.9676885Z  * [new branch]              gh/jansel/571/head          -> origin/gh/jansel/571/head
2025-12-04T08:57:05.9678448Z  * [new branch]              gh/jansel/571/orig          -> origin/gh/jansel/571/orig
2025-12-04T08:57:05.9680832Z  * [new branch]              gh/jansel/572/base          -> origin/gh/jansel/572/base
2025-12-04T08:57:05.9682338Z  * [new branch]              gh/jansel/572/head          -> origin/gh/jansel/572/head
2025-12-04T08:57:05.9683909Z  * [new branch]              gh/jansel/572/orig          -> origin/gh/jansel/572/orig
2025-12-04T08:57:05.9686126Z  * [new branch]              gh/jansel/573/base          -> origin/gh/jansel/573/base
2025-12-04T08:57:05.9687805Z  * [new branch]              gh/jansel/573/head          -> origin/gh/jansel/573/head
2025-12-04T08:57:05.9689409Z  * [new branch]              gh/jansel/573/orig          -> origin/gh/jansel/573/orig
2025-12-04T08:57:05.9691626Z  * [new branch]              gh/jansel/574/base          -> origin/gh/jansel/574/base
2025-12-04T08:57:05.9693191Z  * [new branch]              gh/jansel/574/head          -> origin/gh/jansel/574/head
2025-12-04T08:57:05.9694762Z  * [new branch]              gh/jansel/574/orig          -> origin/gh/jansel/574/orig
2025-12-04T08:57:05.9697015Z  * [new branch]              gh/jansel/575/base          -> origin/gh/jansel/575/base
2025-12-04T08:57:05.9698582Z  * [new branch]              gh/jansel/575/head          -> origin/gh/jansel/575/head
2025-12-04T08:57:05.9700100Z  * [new branch]              gh/jansel/575/orig          -> origin/gh/jansel/575/orig
2025-12-04T08:57:05.9702799Z  * [new branch]              gh/jansel/576/base          -> origin/gh/jansel/576/base
2025-12-04T08:57:05.9704410Z  * [new branch]              gh/jansel/576/head          -> origin/gh/jansel/576/head
2025-12-04T08:57:05.9706107Z  * [new branch]              gh/jansel/576/orig          -> origin/gh/jansel/576/orig
2025-12-04T08:57:05.9708695Z  * [new branch]              gh/jbschlosser/247/base     -> origin/gh/jbschlosser/247/base
2025-12-04T08:57:05.9710237Z  * [new branch]              gh/jbschlosser/247/head     -> origin/gh/jbschlosser/247/head
2025-12-04T08:57:05.9711797Z  * [new branch]              gh/jbschlosser/247/orig     -> origin/gh/jbschlosser/247/orig
2025-12-04T08:57:05.9713982Z  * [new branch]              gh/jbschlosser/250/base     -> origin/gh/jbschlosser/250/base
2025-12-04T08:57:05.9715487Z  * [new branch]              gh/jbschlosser/250/head     -> origin/gh/jbschlosser/250/head
2025-12-04T08:57:05.9717215Z  * [new branch]              gh/jbschlosser/250/orig     -> origin/gh/jbschlosser/250/orig
2025-12-04T08:57:05.9720062Z  * [new branch]              gh/jerryzh168/1/base        -> origin/gh/jerryzh168/1/base
2025-12-04T08:57:05.9721932Z  * [new branch]              gh/jerryzh168/1/head        -> origin/gh/jerryzh168/1/head
2025-12-04T08:57:05.9723214Z  * [new branch]              gh/jerryzh168/1/orig        -> origin/gh/jerryzh168/1/orig
2025-12-04T08:57:05.9725938Z  * [new branch]              gh/jiayisunx/59/base        -> origin/gh/jiayisunx/59/base
2025-12-04T08:57:05.9727500Z  * [new branch]              gh/jiayisunx/59/head        -> origin/gh/jiayisunx/59/head
2025-12-04T08:57:05.9729206Z  * [new branch]              gh/jiayisunx/59/orig        -> origin/gh/jiayisunx/59/orig
2025-12-04T08:57:05.9731196Z  * [new branch]              gh/jiayisunx/61/base        -> origin/gh/jiayisunx/61/base
2025-12-04T08:57:05.9732795Z  * [new branch]              gh/jiayisunx/61/head        -> origin/gh/jiayisunx/61/head
2025-12-04T08:57:05.9734349Z  * [new branch]              gh/jiayisunx/61/orig        -> origin/gh/jiayisunx/61/orig
2025-12-04T08:57:05.9736481Z  * [new branch]              gh/jiayisunx/68/base        -> origin/gh/jiayisunx/68/base
2025-12-04T08:57:05.9738031Z  * [new branch]              gh/jiayisunx/68/head        -> origin/gh/jiayisunx/68/head
2025-12-04T08:57:05.9739502Z  * [new branch]              gh/jiayisunx/68/orig        -> origin/gh/jiayisunx/68/orig
2025-12-04T08:57:05.9742005Z  * [new branch]              gh/jiayisunx/77/base        -> origin/gh/jiayisunx/77/base
2025-12-04T08:57:05.9743275Z  * [new branch]              gh/jiayisunx/77/head        -> origin/gh/jiayisunx/77/head
2025-12-04T08:57:05.9745107Z  * [new branch]              gh/jiayisunx/77/orig        -> origin/gh/jiayisunx/77/orig
2025-12-04T08:57:05.9747453Z  * [new branch]              gh/jiayisunx/78/base        -> origin/gh/jiayisunx/78/base
2025-12-04T08:57:05.9748820Z  * [new branch]              gh/jiayisunx/78/head        -> origin/gh/jiayisunx/78/head
2025-12-04T08:57:05.9750641Z  * [new branch]              gh/jiayisunx/78/orig        -> origin/gh/jiayisunx/78/orig
2025-12-04T08:57:05.9752815Z  * [new branch]              gh/jiayisunx/79/base        -> origin/gh/jiayisunx/79/base
2025-12-04T08:57:05.9754197Z  * [new branch]              gh/jiayisunx/79/head        -> origin/gh/jiayisunx/79/head
2025-12-04T08:57:05.9755902Z  * [new branch]              gh/jiayisunx/79/orig        -> origin/gh/jiayisunx/79/orig
2025-12-04T08:57:05.9758027Z  * [new branch]              gh/jiayisunx/82/base        -> origin/gh/jiayisunx/82/base
2025-12-04T08:57:05.9759516Z  * [new branch]              gh/jiayisunx/82/head        -> origin/gh/jiayisunx/82/head
2025-12-04T08:57:05.9761321Z  * [new branch]              gh/jiayisunx/82/orig        -> origin/gh/jiayisunx/82/orig
2025-12-04T08:57:05.9763319Z  * [new branch]              gh/jiayisunx/83/base        -> origin/gh/jiayisunx/83/base
2025-12-04T08:57:05.9764984Z  * [new branch]              gh/jiayisunx/83/head        -> origin/gh/jiayisunx/83/head
2025-12-04T08:57:05.9766597Z  * [new branch]              gh/jiayisunx/83/orig        -> origin/gh/jiayisunx/83/orig
2025-12-04T08:57:05.9768600Z  * [new branch]              gh/jiayisunx/84/base        -> origin/gh/jiayisunx/84/base
2025-12-04T08:57:05.9770212Z  * [new branch]              gh/jiayisunx/84/head        -> origin/gh/jiayisunx/84/head
2025-12-04T08:57:05.9771816Z  * [new branch]              gh/jiayisunx/84/orig        -> origin/gh/jiayisunx/84/orig
2025-12-04T08:57:05.9773951Z  * [new branch]              gh/jiayisunx/85/base        -> origin/gh/jiayisunx/85/base
2025-12-04T08:57:05.9775485Z  * [new branch]              gh/jiayisunx/85/head        -> origin/gh/jiayisunx/85/head
2025-12-04T08:57:05.9777009Z  * [new branch]              gh/jiayisunx/85/orig        -> origin/gh/jiayisunx/85/orig
2025-12-04T08:57:05.9779033Z  * [new branch]              gh/jiayisunx/86/base        -> origin/gh/jiayisunx/86/base
2025-12-04T08:57:05.9780669Z  * [new branch]              gh/jiayisunx/86/head        -> origin/gh/jiayisunx/86/head
2025-12-04T08:57:05.9782196Z  * [new branch]              gh/jiayisunx/86/orig        -> origin/gh/jiayisunx/86/orig
2025-12-04T08:57:05.9784581Z  * [new branch]              gh/jiayisunx/87/base        -> origin/gh/jiayisunx/87/base
2025-12-04T08:57:05.9785804Z  * [new branch]              gh/jiayisunx/87/head        -> origin/gh/jiayisunx/87/head
2025-12-04T08:57:05.9787579Z  * [new branch]              gh/jiayisunx/87/orig        -> origin/gh/jiayisunx/87/orig
2025-12-04T08:57:05.9789640Z  * [new branch]              gh/jiayisunx/88/base        -> origin/gh/jiayisunx/88/base
2025-12-04T08:57:05.9791238Z  * [new branch]              gh/jiayisunx/88/head        -> origin/gh/jiayisunx/88/head
2025-12-04T08:57:05.9792866Z  * [new branch]              gh/jiayisunx/88/orig        -> origin/gh/jiayisunx/88/orig
2025-12-04T08:57:05.9795014Z  * [new branch]              gh/jiayisunx/89/base        -> origin/gh/jiayisunx/89/base
2025-12-04T08:57:05.9796563Z  * [new branch]              gh/jiayisunx/89/head        -> origin/gh/jiayisunx/89/head
2025-12-04T08:57:05.9798250Z  * [new branch]              gh/jiayisunx/89/orig        -> origin/gh/jiayisunx/89/orig
2025-12-04T08:57:05.9800486Z  * [new branch]              gh/jiayisunx/90/base        -> origin/gh/jiayisunx/90/base
2025-12-04T08:57:05.9802160Z  * [new branch]              gh/jiayisunx/90/head        -> origin/gh/jiayisunx/90/head
2025-12-04T08:57:05.9803698Z  * [new branch]              gh/jiayisunx/90/orig        -> origin/gh/jiayisunx/90/orig
2025-12-04T08:57:05.9806134Z  * [new branch]              gh/jjwu@meta.com/1/base     -> origin/gh/jjwu@meta.com/1/base
2025-12-04T08:57:05.9807703Z  * [new branch]              gh/jjwu@meta.com/1/head     -> origin/gh/jjwu@meta.com/1/head
2025-12-04T08:57:05.9810615Z  * [new branch]              gh/jturney/1/base           -> origin/gh/jturney/1/base
2025-12-04T08:57:05.9812309Z  * [new branch]              gh/jturney/1/head           -> origin/gh/jturney/1/head
2025-12-04T08:57:05.9813869Z  * [new branch]              gh/jturney/1/orig           -> origin/gh/jturney/1/orig
2025-12-04T08:57:05.9815982Z  * [new branch]              gh/jturney/2/base           -> origin/gh/jturney/2/base
2025-12-04T08:57:05.9817854Z  * [new branch]              gh/jturney/2/head           -> origin/gh/jturney/2/head
2025-12-04T08:57:05.9821181Z  * [new branch]              gh/jturney/2/orig           -> origin/gh/jturney/2/orig
2025-12-04T08:57:05.9823778Z  * [new branch]              gh/karthickai/10/base       -> origin/gh/karthickai/10/base
2025-12-04T08:57:05.9825473Z  * [new branch]              gh/karthickai/10/head       -> origin/gh/karthickai/10/head
2025-12-04T08:57:05.9827063Z  * [new branch]              gh/karthickai/10/orig       -> origin/gh/karthickai/10/orig
2025-12-04T08:57:05.9829164Z  * [new branch]              gh/karthickai/11/base       -> origin/gh/karthickai/11/base
2025-12-04T08:57:05.9830832Z  * [new branch]              gh/karthickai/11/head       -> origin/gh/karthickai/11/head
2025-12-04T08:57:05.9832429Z  * [new branch]              gh/karthickai/11/orig       -> origin/gh/karthickai/11/orig
2025-12-04T08:57:05.9834919Z  * [new branch]              gh/karthickai/12/base       -> origin/gh/karthickai/12/base
2025-12-04T08:57:05.9836639Z  * [new branch]              gh/karthickai/12/head       -> origin/gh/karthickai/12/head
2025-12-04T08:57:05.9838272Z  * [new branch]              gh/karthickai/12/orig       -> origin/gh/karthickai/12/orig
2025-12-04T08:57:05.9840533Z  * [new branch]              gh/karthickai/13/base       -> origin/gh/karthickai/13/base
2025-12-04T08:57:05.9842180Z  * [new branch]              gh/karthickai/13/head       -> origin/gh/karthickai/13/head
2025-12-04T08:57:05.9843717Z  * [new branch]              gh/karthickai/13/orig       -> origin/gh/karthickai/13/orig
2025-12-04T08:57:05.9845964Z  * [new branch]              gh/karthickai/14/base       -> origin/gh/karthickai/14/base
2025-12-04T08:57:05.9847612Z  * [new branch]              gh/karthickai/14/head       -> origin/gh/karthickai/14/head
2025-12-04T08:57:05.9849247Z  * [new branch]              gh/karthickai/14/orig       -> origin/gh/karthickai/14/orig
2025-12-04T08:57:05.9851729Z  * [new branch]              gh/karthickai/15/base       -> origin/gh/karthickai/15/base
2025-12-04T08:57:05.9853217Z  * [new branch]              gh/karthickai/15/head       -> origin/gh/karthickai/15/head
2025-12-04T08:57:05.9854734Z  * [new branch]              gh/karthickai/15/orig       -> origin/gh/karthickai/15/orig
2025-12-04T08:57:05.9856863Z  * [new branch]              gh/karthickai/16/base       -> origin/gh/karthickai/16/base
2025-12-04T08:57:05.9858882Z  * [new branch]              gh/karthickai/16/head       -> origin/gh/karthickai/16/head
2025-12-04T08:57:05.9860435Z  * [new branch]              gh/karthickai/16/orig       -> origin/gh/karthickai/16/orig
2025-12-04T08:57:05.9862569Z  * [new branch]              gh/karthickai/17/base       -> origin/gh/karthickai/17/base
2025-12-04T08:57:05.9864053Z  * [new branch]              gh/karthickai/17/head       -> origin/gh/karthickai/17/head
2025-12-04T08:57:05.9865617Z  * [new branch]              gh/karthickai/17/orig       -> origin/gh/karthickai/17/orig
2025-12-04T08:57:05.9867822Z  * [new branch]              gh/karthickai/18/base       -> origin/gh/karthickai/18/base
2025-12-04T08:57:05.9869627Z  * [new branch]              gh/karthickai/18/head       -> origin/gh/karthickai/18/head
2025-12-04T08:57:05.9871285Z  * [new branch]              gh/karthickai/18/orig       -> origin/gh/karthickai/18/orig
2025-12-04T08:57:05.9873478Z  * [new branch]              gh/karthickai/19/base       -> origin/gh/karthickai/19/base
2025-12-04T08:57:05.9875530Z  * [new branch]              gh/karthickai/19/head       -> origin/gh/karthickai/19/head
2025-12-04T08:57:05.9877335Z  * [new branch]              gh/karthickai/19/orig       -> origin/gh/karthickai/19/orig
2025-12-04T08:57:05.9880126Z  * [new branch]              gh/karthickai/20/base       -> origin/gh/karthickai/20/base
2025-12-04T08:57:05.9882337Z  * [new branch]              gh/karthickai/20/head       -> origin/gh/karthickai/20/head
2025-12-04T08:57:05.9884005Z  * [new branch]              gh/karthickai/20/orig       -> origin/gh/karthickai/20/orig
2025-12-04T08:57:05.9886132Z  * [new branch]              gh/karthickai/21/base       -> origin/gh/karthickai/21/base
2025-12-04T08:57:05.9887854Z  * [new branch]              gh/karthickai/21/head       -> origin/gh/karthickai/21/head
2025-12-04T08:57:05.9889616Z  * [new branch]              gh/karthickai/21/orig       -> origin/gh/karthickai/21/orig
2025-12-04T08:57:05.9891927Z  * [new branch]              gh/karthickai/22/base       -> origin/gh/karthickai/22/base
2025-12-04T08:57:05.9893457Z  * [new branch]              gh/karthickai/22/head       -> origin/gh/karthickai/22/head
2025-12-04T08:57:05.9894987Z  * [new branch]              gh/karthickai/22/orig       -> origin/gh/karthickai/22/orig
2025-12-04T08:57:05.9897287Z  * [new branch]              gh/karthickai/23/base       -> origin/gh/karthickai/23/base
2025-12-04T08:57:05.9898981Z  * [new branch]              gh/karthickai/23/head       -> origin/gh/karthickai/23/head
2025-12-04T08:57:05.9900540Z  * [new branch]              gh/karthickai/23/orig       -> origin/gh/karthickai/23/orig
2025-12-04T08:57:05.9902723Z  * [new branch]              gh/karthickai/24/base       -> origin/gh/karthickai/24/base
2025-12-04T08:57:05.9904356Z  * [new branch]              gh/karthickai/24/head       -> origin/gh/karthickai/24/head
2025-12-04T08:57:05.9905999Z  * [new branch]              gh/karthickai/24/orig       -> origin/gh/karthickai/24/orig
2025-12-04T08:57:05.9908584Z  * [new branch]              gh/karthickai/25/base       -> origin/gh/karthickai/25/base
2025-12-04T08:57:05.9910253Z  * [new branch]              gh/karthickai/25/head       -> origin/gh/karthickai/25/head
2025-12-04T08:57:05.9911816Z  * [new branch]              gh/karthickai/25/orig       -> origin/gh/karthickai/25/orig
2025-12-04T08:57:05.9913877Z  * [new branch]              gh/karthickai/26/base       -> origin/gh/karthickai/26/base
2025-12-04T08:57:05.9915562Z  * [new branch]              gh/karthickai/26/head       -> origin/gh/karthickai/26/head
2025-12-04T08:57:05.9917624Z  * [new branch]              gh/karthickai/26/orig       -> origin/gh/karthickai/26/orig
2025-12-04T08:57:05.9921016Z  * [new branch]              gh/karthickai/6/base        -> origin/gh/karthickai/6/base
2025-12-04T08:57:05.9922939Z  * [new branch]              gh/karthickai/6/head        -> origin/gh/karthickai/6/head
2025-12-04T08:57:05.9924494Z  * [new branch]              gh/karthickai/6/orig        -> origin/gh/karthickai/6/orig
2025-12-04T08:57:05.9927158Z  * [new branch]              gh/krocki/1/base            -> origin/gh/krocki/1/base
2025-12-04T08:57:05.9928676Z  * [new branch]              gh/krocki/1/head            -> origin/gh/krocki/1/head
2025-12-04T08:57:05.9930283Z  * [new branch]              gh/krocki/1/orig            -> origin/gh/krocki/1/orig
2025-12-04T08:57:05.9932401Z  * [new branch]              gh/krocki/2/base            -> origin/gh/krocki/2/base
2025-12-04T08:57:05.9933965Z  * [new branch]              gh/krocki/2/head            -> origin/gh/krocki/2/head
2025-12-04T08:57:05.9935517Z  * [new branch]              gh/krocki/2/orig            -> origin/gh/krocki/2/orig
2025-12-04T08:57:05.9938147Z  * [new branch]              gh/kurtamohler/60/base      -> origin/gh/kurtamohler/60/base
2025-12-04T08:57:05.9939831Z  * [new branch]              gh/kurtamohler/60/head      -> origin/gh/kurtamohler/60/head
2025-12-04T08:57:05.9941384Z  * [new branch]              gh/kurtamohler/60/orig      -> origin/gh/kurtamohler/60/orig
2025-12-04T08:57:05.9943570Z  * [new branch]              gh/kurtamohler/61/base      -> origin/gh/kurtamohler/61/base
2025-12-04T08:57:05.9945602Z  * [new branch]              gh/kurtamohler/61/head      -> origin/gh/kurtamohler/61/head
2025-12-04T08:57:05.9947287Z  * [new branch]              gh/kurtamohler/61/orig      -> origin/gh/kurtamohler/61/orig
2025-12-04T08:57:05.9949365Z  * [new branch]              gh/kurtamohler/62/base      -> origin/gh/kurtamohler/62/base
2025-12-04T08:57:05.9950935Z  * [new branch]              gh/kurtamohler/62/head      -> origin/gh/kurtamohler/62/head
2025-12-04T08:57:05.9952534Z  * [new branch]              gh/kurtamohler/62/orig      -> origin/gh/kurtamohler/62/orig
2025-12-04T08:57:05.9954666Z  * [new branch]              gh/kurtamohler/63/base      -> origin/gh/kurtamohler/63/base
2025-12-04T08:57:05.9956290Z  * [new branch]              gh/kurtamohler/63/head      -> origin/gh/kurtamohler/63/head
2025-12-04T08:57:05.9957849Z  * [new branch]              gh/kurtamohler/63/orig      -> origin/gh/kurtamohler/63/orig
2025-12-04T08:57:05.9960079Z  * [new branch]              gh/kurtamohler/64/base      -> origin/gh/kurtamohler/64/base
2025-12-04T08:57:05.9961706Z  * [new branch]              gh/kurtamohler/64/head      -> origin/gh/kurtamohler/64/head
2025-12-04T08:57:05.9963261Z  * [new branch]              gh/kurtamohler/64/orig      -> origin/gh/kurtamohler/64/orig
2025-12-04T08:57:05.9965363Z  * [new branch]              gh/kurtamohler/65/base      -> origin/gh/kurtamohler/65/base
2025-12-04T08:57:05.9966976Z  * [new branch]              gh/kurtamohler/65/head      -> origin/gh/kurtamohler/65/head
2025-12-04T08:57:05.9968568Z  * [new branch]              gh/kurtamohler/65/orig      -> origin/gh/kurtamohler/65/orig
2025-12-04T08:57:05.9970608Z  * [new branch]              gh/kurtamohler/66/base      -> origin/gh/kurtamohler/66/base
2025-12-04T08:57:05.9972199Z  * [new branch]              gh/kurtamohler/66/head      -> origin/gh/kurtamohler/66/head
2025-12-04T08:57:05.9973790Z  * [new branch]              gh/kurtamohler/66/orig      -> origin/gh/kurtamohler/66/orig
2025-12-04T08:57:05.9975961Z  * [new branch]              gh/kurtamohler/67/base      -> origin/gh/kurtamohler/67/base
2025-12-04T08:57:05.9977547Z  * [new branch]              gh/kurtamohler/67/head      -> origin/gh/kurtamohler/67/head
2025-12-04T08:57:05.9979145Z  * [new branch]              gh/kurtamohler/67/orig      -> origin/gh/kurtamohler/67/orig
2025-12-04T08:57:05.9981869Z  * [new branch]              gh/kwen2501/130/base        -> origin/gh/kwen2501/130/base
2025-12-04T08:57:05.9983622Z  * [new branch]              gh/kwen2501/130/head        -> origin/gh/kwen2501/130/head
2025-12-04T08:57:05.9985163Z  * [new branch]              gh/kwen2501/130/orig        -> origin/gh/kwen2501/130/orig
2025-12-04T08:57:05.9987307Z  * [new branch]              gh/kwen2501/170/base        -> origin/gh/kwen2501/170/base
2025-12-04T08:57:05.9988969Z  * [new branch]              gh/kwen2501/170/head        -> origin/gh/kwen2501/170/head
2025-12-04T08:57:05.9991165Z  * [new branch]              gh/kwen2501/187/base        -> origin/gh/kwen2501/187/base
2025-12-04T08:57:05.9992786Z  * [new branch]              gh/kwen2501/187/head        -> origin/gh/kwen2501/187/head
2025-12-04T08:57:05.9994399Z  * [new branch]              gh/kwen2501/187/orig        -> origin/gh/kwen2501/187/orig
2025-12-04T08:57:05.9996566Z  * [new branch]              gh/kwen2501/188/base        -> origin/gh/kwen2501/188/base
2025-12-04T08:57:05.9998114Z  * [new branch]              gh/kwen2501/188/head        -> origin/gh/kwen2501/188/head
2025-12-04T08:57:05.9999708Z  * [new branch]              gh/kwen2501/188/orig        -> origin/gh/kwen2501/188/orig
2025-12-04T08:57:06.0001993Z  * [new branch]              gh/kwen2501/211/base        -> origin/gh/kwen2501/211/base
2025-12-04T08:57:06.0003570Z  * [new branch]              gh/kwen2501/211/head        -> origin/gh/kwen2501/211/head
2025-12-04T08:57:06.0005650Z  * [new branch]              gh/kwen2501/224/base        -> origin/gh/kwen2501/224/base
2025-12-04T08:57:06.0007209Z  * [new branch]              gh/kwen2501/224/head        -> origin/gh/kwen2501/224/head
2025-12-04T08:57:06.0008794Z  * [new branch]              gh/kwen2501/224/orig        -> origin/gh/kwen2501/224/orig
2025-12-04T08:57:06.0010961Z  * [new branch]              gh/kwen2501/228/base        -> origin/gh/kwen2501/228/base
2025-12-04T08:57:06.0012543Z  * [new branch]              gh/kwen2501/228/head        -> origin/gh/kwen2501/228/head
2025-12-04T08:57:06.0014128Z  * [new branch]              gh/kwen2501/228/orig        -> origin/gh/kwen2501/228/orig
2025-12-04T08:57:06.0016329Z  * [new branch]              gh/kwen2501/234/base        -> origin/gh/kwen2501/234/base
2025-12-04T08:57:06.0018179Z  * [new branch]              gh/kwen2501/234/head        -> origin/gh/kwen2501/234/head
2025-12-04T08:57:06.0019717Z  * [new branch]              gh/kwen2501/234/orig        -> origin/gh/kwen2501/234/orig
2025-12-04T08:57:06.0021885Z  * [new branch]              gh/kwen2501/235/base        -> origin/gh/kwen2501/235/base
2025-12-04T08:57:06.0023475Z  * [new branch]              gh/kwen2501/235/head        -> origin/gh/kwen2501/235/head
2025-12-04T08:57:06.0025051Z  * [new branch]              gh/kwen2501/235/orig        -> origin/gh/kwen2501/235/orig
2025-12-04T08:57:06.0027165Z  * [new branch]              gh/kwen2501/236/base        -> origin/gh/kwen2501/236/base
2025-12-04T08:57:06.0028736Z  * [new branch]              gh/kwen2501/236/head        -> origin/gh/kwen2501/236/head
2025-12-04T08:57:06.0030283Z  * [new branch]              gh/kwen2501/236/orig        -> origin/gh/kwen2501/236/orig
2025-12-04T08:57:06.0032456Z  * [new branch]              gh/kwen2501/237/base        -> origin/gh/kwen2501/237/base
2025-12-04T08:57:06.0034466Z  * [new branch]              gh/kwen2501/237/head        -> origin/gh/kwen2501/237/head
2025-12-04T08:57:06.0036054Z  * [new branch]              gh/kwen2501/237/orig        -> origin/gh/kwen2501/237/orig
2025-12-04T08:57:06.0038207Z  * [new branch]              gh/kwen2501/238/base        -> origin/gh/kwen2501/238/base
2025-12-04T08:57:06.0039772Z  * [new branch]              gh/kwen2501/238/head        -> origin/gh/kwen2501/238/head
2025-12-04T08:57:06.0041505Z  * [new branch]              gh/kwen2501/238/orig        -> origin/gh/kwen2501/238/orig
2025-12-04T08:57:06.0043656Z  * [new branch]              gh/kwen2501/240/base        -> origin/gh/kwen2501/240/base
2025-12-04T08:57:06.0045243Z  * [new branch]              gh/kwen2501/240/head        -> origin/gh/kwen2501/240/head
2025-12-04T08:57:06.0046946Z  * [new branch]              gh/kwen2501/240/orig        -> origin/gh/kwen2501/240/orig
2025-12-04T08:57:06.0049040Z  * [new branch]              gh/kwen2501/241/base        -> origin/gh/kwen2501/241/base
2025-12-04T08:57:06.0050502Z  * [new branch]              gh/kwen2501/241/head        -> origin/gh/kwen2501/241/head
2025-12-04T08:57:06.0052079Z  * [new branch]              gh/kwen2501/241/orig        -> origin/gh/kwen2501/241/orig
2025-12-04T08:57:06.0054262Z  * [new branch]              gh/kwen2501/247/base        -> origin/gh/kwen2501/247/base
2025-12-04T08:57:06.0055828Z  * [new branch]              gh/kwen2501/247/head        -> origin/gh/kwen2501/247/head
2025-12-04T08:57:06.0057388Z  * [new branch]              gh/kwen2501/247/orig        -> origin/gh/kwen2501/247/orig
2025-12-04T08:57:06.0059503Z  * [new branch]              gh/kwen2501/252/base        -> origin/gh/kwen2501/252/base
2025-12-04T08:57:06.0061859Z  * [new branch]              gh/kwen2501/252/head        -> origin/gh/kwen2501/252/head
2025-12-04T08:57:06.0063470Z  * [new branch]              gh/kwen2501/252/orig        -> origin/gh/kwen2501/252/orig
2025-12-04T08:57:06.0066111Z  * [new branch]              gh/kwen2501/259/base        -> origin/gh/kwen2501/259/base
2025-12-04T08:57:06.0067772Z  * [new branch]              gh/kwen2501/259/head        -> origin/gh/kwen2501/259/head
2025-12-04T08:57:06.0069407Z  * [new branch]              gh/kwen2501/259/orig        -> origin/gh/kwen2501/259/orig
2025-12-04T08:57:06.0071642Z  * [new branch]              gh/kwen2501/260/base        -> origin/gh/kwen2501/260/base
2025-12-04T08:57:06.0073288Z  * [new branch]              gh/kwen2501/260/head        -> origin/gh/kwen2501/260/head
2025-12-04T08:57:06.0074920Z  * [new branch]              gh/kwen2501/260/orig        -> origin/gh/kwen2501/260/orig
2025-12-04T08:57:06.0077147Z  * [new branch]              gh/kwen2501/268/base        -> origin/gh/kwen2501/268/base
2025-12-04T08:57:06.0078714Z  * [new branch]              gh/kwen2501/268/head        -> origin/gh/kwen2501/268/head
2025-12-04T08:57:06.0080352Z  * [new branch]              gh/kwen2501/268/orig        -> origin/gh/kwen2501/268/orig
2025-12-04T08:57:06.0082523Z  * [new branch]              gh/kwen2501/269/base        -> origin/gh/kwen2501/269/base
2025-12-04T08:57:06.0084177Z  * [new branch]              gh/kwen2501/269/head        -> origin/gh/kwen2501/269/head
2025-12-04T08:57:06.0085759Z  * [new branch]              gh/kwen2501/269/orig        -> origin/gh/kwen2501/269/orig
2025-12-04T08:57:06.0088052Z  * [new branch]              gh/kwen2501/270/base        -> origin/gh/kwen2501/270/base
2025-12-04T08:57:06.0089833Z  * [new branch]              gh/kwen2501/270/head        -> origin/gh/kwen2501/270/head
2025-12-04T08:57:06.0091280Z  * [new branch]              gh/kwen2501/270/orig        -> origin/gh/kwen2501/270/orig
2025-12-04T08:57:06.0093602Z  * [new branch]              gh/kwen2501/271/base        -> origin/gh/kwen2501/271/base
2025-12-04T08:57:06.0095301Z  * [new branch]              gh/kwen2501/271/head        -> origin/gh/kwen2501/271/head
2025-12-04T08:57:06.0096814Z  * [new branch]              gh/kwen2501/271/orig        -> origin/gh/kwen2501/271/orig
2025-12-04T08:57:06.0099132Z  * [new branch]              gh/kwen2501/274/base        -> origin/gh/kwen2501/274/base
2025-12-04T08:57:06.0100709Z  * [new branch]              gh/kwen2501/274/head        -> origin/gh/kwen2501/274/head
2025-12-04T08:57:06.0102408Z  * [new branch]              gh/kwen2501/274/orig        -> origin/gh/kwen2501/274/orig
2025-12-04T08:57:06.0104659Z  * [new branch]              gh/kwen2501/275/base        -> origin/gh/kwen2501/275/base
2025-12-04T08:57:06.0106330Z  * [new branch]              gh/kwen2501/275/head        -> origin/gh/kwen2501/275/head
2025-12-04T08:57:06.0107915Z  * [new branch]              gh/kwen2501/275/orig        -> origin/gh/kwen2501/275/orig
2025-12-04T08:57:06.0110149Z  * [new branch]              gh/kwen2501/276/base        -> origin/gh/kwen2501/276/base
2025-12-04T08:57:06.0111868Z  * [new branch]              gh/kwen2501/276/head        -> origin/gh/kwen2501/276/head
2025-12-04T08:57:06.0113329Z  * [new branch]              gh/kwen2501/276/orig        -> origin/gh/kwen2501/276/orig
2025-12-04T08:57:06.0115920Z  * [new branch]              gh/kwen2501/277/base        -> origin/gh/kwen2501/277/base
2025-12-04T08:57:06.0117823Z  * [new branch]              gh/kwen2501/277/head        -> origin/gh/kwen2501/277/head
2025-12-04T08:57:06.0119403Z  * [new branch]              gh/kwen2501/277/orig        -> origin/gh/kwen2501/277/orig
2025-12-04T08:57:06.0121820Z  * [new branch]              gh/kwen2501/278/base        -> origin/gh/kwen2501/278/base
2025-12-04T08:57:06.0123364Z  * [new branch]              gh/kwen2501/278/head        -> origin/gh/kwen2501/278/head
2025-12-04T08:57:06.0124963Z  * [new branch]              gh/kwen2501/278/orig        -> origin/gh/kwen2501/278/orig
2025-12-04T08:57:06.0127209Z  * [new branch]              gh/kwen2501/279/base        -> origin/gh/kwen2501/279/base
2025-12-04T08:57:06.0129029Z  * [new branch]              gh/kwen2501/279/head        -> origin/gh/kwen2501/279/head
2025-12-04T08:57:06.0130587Z  * [new branch]              gh/kwen2501/279/orig        -> origin/gh/kwen2501/279/orig
2025-12-04T08:57:06.0132797Z  * [new branch]              gh/kwen2501/280/base        -> origin/gh/kwen2501/280/base
2025-12-04T08:57:06.0134399Z  * [new branch]              gh/kwen2501/280/head        -> origin/gh/kwen2501/280/head
2025-12-04T08:57:06.0136012Z  * [new branch]              gh/kwen2501/280/orig        -> origin/gh/kwen2501/280/orig
2025-12-04T08:57:06.0138195Z  * [new branch]              gh/kwen2501/281/base        -> origin/gh/kwen2501/281/base
2025-12-04T08:57:06.0139794Z  * [new branch]              gh/kwen2501/281/head        -> origin/gh/kwen2501/281/head
2025-12-04T08:57:06.0141352Z  * [new branch]              gh/kwen2501/281/orig        -> origin/gh/kwen2501/281/orig
2025-12-04T08:57:06.0143718Z  * [new branch]              gh/kwen2501/282/base        -> origin/gh/kwen2501/282/base
2025-12-04T08:57:06.0145343Z  * [new branch]              gh/kwen2501/282/head        -> origin/gh/kwen2501/282/head
2025-12-04T08:57:06.0146923Z  * [new branch]              gh/kwen2501/282/orig        -> origin/gh/kwen2501/282/orig
2025-12-04T08:57:06.0149082Z  * [new branch]              gh/kwen2501/283/base        -> origin/gh/kwen2501/283/base
2025-12-04T08:57:06.0150641Z  * [new branch]              gh/kwen2501/283/head        -> origin/gh/kwen2501/283/head
2025-12-04T08:57:06.0152256Z  * [new branch]              gh/kwen2501/283/orig        -> origin/gh/kwen2501/283/orig
2025-12-04T08:57:06.0154454Z  * [new branch]              gh/kwen2501/284/base        -> origin/gh/kwen2501/284/base
2025-12-04T08:57:06.0156159Z  * [new branch]              gh/kwen2501/284/head        -> origin/gh/kwen2501/284/head
2025-12-04T08:57:06.0157733Z  * [new branch]              gh/kwen2501/284/orig        -> origin/gh/kwen2501/284/orig
2025-12-04T08:57:06.0160057Z  * [new branch]              gh/kwen2501/285/base        -> origin/gh/kwen2501/285/base
2025-12-04T08:57:06.0161718Z  * [new branch]              gh/kwen2501/285/head        -> origin/gh/kwen2501/285/head
2025-12-04T08:57:06.0163298Z  * [new branch]              gh/kwen2501/285/orig        -> origin/gh/kwen2501/285/orig
2025-12-04T08:57:06.0165555Z  * [new branch]              gh/kwen2501/286/base        -> origin/gh/kwen2501/286/base
2025-12-04T08:57:06.0167160Z  * [new branch]              gh/kwen2501/286/head        -> origin/gh/kwen2501/286/head
2025-12-04T08:57:06.0168761Z  * [new branch]              gh/kwen2501/286/orig        -> origin/gh/kwen2501/286/orig
2025-12-04T08:57:06.0170872Z  * [new branch]              gh/kwen2501/287/base        -> origin/gh/kwen2501/287/base
2025-12-04T08:57:06.0172442Z  * [new branch]              gh/kwen2501/287/head        -> origin/gh/kwen2501/287/head
2025-12-04T08:57:06.0174053Z  * [new branch]              gh/kwen2501/287/orig        -> origin/gh/kwen2501/287/orig
2025-12-04T08:57:06.0176456Z  * [new branch]              gh/kwen2501/288/base        -> origin/gh/kwen2501/288/base
2025-12-04T08:57:06.0177690Z  * [new branch]              gh/kwen2501/288/head        -> origin/gh/kwen2501/288/head
2025-12-04T08:57:06.0179522Z  * [new branch]              gh/kwen2501/288/orig        -> origin/gh/kwen2501/288/orig
2025-12-04T08:57:06.0182588Z  * [new branch]              gh/laithsakka/251/base      -> origin/gh/laithsakka/251/base
2025-12-04T08:57:06.0184184Z  * [new branch]              gh/laithsakka/251/head      -> origin/gh/laithsakka/251/head
2025-12-04T08:57:06.0185851Z  * [new branch]              gh/laithsakka/251/orig      -> origin/gh/laithsakka/251/orig
2025-12-04T08:57:06.0187965Z  * [new branch]              gh/laithsakka/276/base      -> origin/gh/laithsakka/276/base
2025-12-04T08:57:06.0189498Z  * [new branch]              gh/laithsakka/276/head      -> origin/gh/laithsakka/276/head
2025-12-04T08:57:06.0191057Z  * [new branch]              gh/laithsakka/276/orig      -> origin/gh/laithsakka/276/orig
2025-12-04T08:57:06.0193286Z  * [new branch]              gh/laithsakka/28/base       -> origin/gh/laithsakka/28/base
2025-12-04T08:57:06.0195280Z  * [new branch]              gh/laithsakka/29/base       -> origin/gh/laithsakka/29/base
2025-12-04T08:57:06.0197273Z  * [new branch]              gh/laithsakka/30/base       -> origin/gh/laithsakka/30/base
2025-12-04T08:57:06.0198898Z  * [new branch]              gh/laithsakka/30/head       -> origin/gh/laithsakka/30/head
2025-12-04T08:57:06.0200999Z  * [new branch]              gh/laithsakka/31/base       -> origin/gh/laithsakka/31/base
2025-12-04T08:57:06.0202306Z  * [new branch]              gh/laithsakka/31/head       -> origin/gh/laithsakka/31/head
2025-12-04T08:57:06.0204897Z  * [new branch]              gh/laithsakka/313/base      -> origin/gh/laithsakka/313/base
2025-12-04T08:57:06.0206208Z  * [new branch]              gh/laithsakka/313/head      -> origin/gh/laithsakka/313/head
2025-12-04T08:57:06.0208033Z  * [new branch]              gh/laithsakka/313/orig      -> origin/gh/laithsakka/313/orig
2025-12-04T08:57:06.0210311Z  * [new branch]              gh/laithsakka/316/base      -> origin/gh/laithsakka/316/base
2025-12-04T08:57:06.0211809Z  * [new branch]              gh/laithsakka/316/head      -> origin/gh/laithsakka/316/head
2025-12-04T08:57:06.0213310Z  * [new branch]              gh/laithsakka/316/orig      -> origin/gh/laithsakka/316/orig
2025-12-04T08:57:06.0215530Z  * [new branch]              gh/laithsakka/317/base      -> origin/gh/laithsakka/317/base
2025-12-04T08:57:06.0216975Z  * [new branch]              gh/laithsakka/317/head      -> origin/gh/laithsakka/317/head
2025-12-04T08:57:06.0230438Z  * [new branch]              gh/laithsakka/317/orig      -> origin/gh/laithsakka/317/orig
2025-12-04T08:57:06.0230963Z  * [new branch]              gh/laithsakka/319/base      -> origin/gh/laithsakka/319/base
2025-12-04T08:57:06.0231382Z  * [new branch]              gh/laithsakka/319/head      -> origin/gh/laithsakka/319/head
2025-12-04T08:57:06.0231789Z  * [new branch]              gh/laithsakka/319/orig      -> origin/gh/laithsakka/319/orig
2025-12-04T08:57:06.0232211Z  * [new branch]              gh/laithsakka/32/base       -> origin/gh/laithsakka/32/base
2025-12-04T08:57:06.0232610Z  * [new branch]              gh/laithsakka/32/head       -> origin/gh/laithsakka/32/head
2025-12-04T08:57:06.0232997Z  * [new branch]              gh/laithsakka/320/base      -> origin/gh/laithsakka/320/base
2025-12-04T08:57:06.0233389Z  * [new branch]              gh/laithsakka/320/head      -> origin/gh/laithsakka/320/head
2025-12-04T08:57:06.0234836Z  * [new branch]              gh/laithsakka/320/orig      -> origin/gh/laithsakka/320/orig
2025-12-04T08:57:06.0236851Z  * [new branch]              gh/laithsakka/321/base      -> origin/gh/laithsakka/321/base
2025-12-04T08:57:06.0238404Z  * [new branch]              gh/laithsakka/321/head      -> origin/gh/laithsakka/321/head
2025-12-04T08:57:06.0240166Z  * [new branch]              gh/laithsakka/321/orig      -> origin/gh/laithsakka/321/orig
2025-12-04T08:57:06.0242340Z  * [new branch]              gh/laithsakka/322/base      -> origin/gh/laithsakka/322/base
2025-12-04T08:57:06.0243889Z  * [new branch]              gh/laithsakka/322/head      -> origin/gh/laithsakka/322/head
2025-12-04T08:57:06.0245457Z  * [new branch]              gh/laithsakka/322/orig      -> origin/gh/laithsakka/322/orig
2025-12-04T08:57:06.0247640Z  * [new branch]              gh/laithsakka/323/base      -> origin/gh/laithsakka/323/base
2025-12-04T08:57:06.0249463Z  * [new branch]              gh/laithsakka/323/head      -> origin/gh/laithsakka/323/head
2025-12-04T08:57:06.0251052Z  * [new branch]              gh/laithsakka/323/orig      -> origin/gh/laithsakka/323/orig
2025-12-04T08:57:06.0253374Z  * [new branch]              gh/laithsakka/324/base      -> origin/gh/laithsakka/324/base
2025-12-04T08:57:06.0254894Z  * [new branch]              gh/laithsakka/324/head      -> origin/gh/laithsakka/324/head
2025-12-04T08:57:06.0256480Z  * [new branch]              gh/laithsakka/324/orig      -> origin/gh/laithsakka/324/orig
2025-12-04T08:57:06.0258652Z  * [new branch]              gh/laithsakka/325/base      -> origin/gh/laithsakka/325/base
2025-12-04T08:57:06.0260209Z  * [new branch]              gh/laithsakka/325/head      -> origin/gh/laithsakka/325/head
2025-12-04T08:57:06.0261863Z  * [new branch]              gh/laithsakka/325/orig      -> origin/gh/laithsakka/325/orig
2025-12-04T08:57:06.0264316Z  * [new branch]              gh/laithsakka/326/base      -> origin/gh/laithsakka/326/base
2025-12-04T08:57:06.0265916Z  * [new branch]              gh/laithsakka/326/head      -> origin/gh/laithsakka/326/head
2025-12-04T08:57:06.0267518Z  * [new branch]              gh/laithsakka/326/orig      -> origin/gh/laithsakka/326/orig
2025-12-04T08:57:06.0269795Z  * [new branch]              gh/laithsakka/327/base      -> origin/gh/laithsakka/327/base
2025-12-04T08:57:06.0271510Z  * [new branch]              gh/laithsakka/327/head      -> origin/gh/laithsakka/327/head
2025-12-04T08:57:06.0273102Z  * [new branch]              gh/laithsakka/327/orig      -> origin/gh/laithsakka/327/orig
2025-12-04T08:57:06.0275271Z  * [new branch]              gh/laithsakka/328/base      -> origin/gh/laithsakka/328/base
2025-12-04T08:57:06.0276887Z  * [new branch]              gh/laithsakka/328/head      -> origin/gh/laithsakka/328/head
2025-12-04T08:57:06.0278893Z  * [new branch]              gh/laithsakka/328/orig      -> origin/gh/laithsakka/328/orig
2025-12-04T08:57:06.0281621Z  * [new branch]              gh/liangel/4/base           -> origin/gh/liangel/4/base
2025-12-04T08:57:06.0283621Z  * [new branch]              gh/liangel/4/head           -> origin/gh/liangel/4/head
2025-12-04T08:57:06.0285321Z  * [new branch]              gh/liangel/4/orig           -> origin/gh/liangel/4/orig
2025-12-04T08:57:06.0289322Z  * [new branch]              gh/lucaskabela/1/base       -> origin/gh/lucaskabela/1/base
2025-12-04T08:57:06.0290796Z  * [new branch]              gh/lucaskabela/1/head       -> origin/gh/lucaskabela/1/head
2025-12-04T08:57:06.0293460Z  * [new branch]              gh/lw/4/base                -> origin/gh/lw/4/base
2025-12-04T08:57:06.0295053Z  * [new branch]              gh/lw/4/head                -> origin/gh/lw/4/head
2025-12-04T08:57:06.0296624Z  * [new branch]              gh/lw/4/orig                -> origin/gh/lw/4/orig
2025-12-04T08:57:06.0298716Z  * [new branch]              gh/lw/5/base                -> origin/gh/lw/5/base
2025-12-04T08:57:06.0300268Z  * [new branch]              gh/lw/5/head                -> origin/gh/lw/5/head
2025-12-04T08:57:06.0301906Z  * [new branch]              gh/lw/5/orig                -> origin/gh/lw/5/orig
2025-12-04T08:57:06.0303950Z  * [new branch]              gh/lw/6/base                -> origin/gh/lw/6/base
2025-12-04T08:57:06.0305500Z  * [new branch]              gh/lw/6/head                -> origin/gh/lw/6/head
2025-12-04T08:57:06.0307099Z  * [new branch]              gh/lw/6/orig                -> origin/gh/lw/6/orig
2025-12-04T08:57:06.0309717Z  * [new branch]              gh/malfet/14/base           -> origin/gh/malfet/14/base
2025-12-04T08:57:06.0311792Z  * [new branch]              gh/malfet/417/base          -> origin/gh/malfet/417/base
2025-12-04T08:57:06.0313359Z  * [new branch]              gh/malfet/417/head          -> origin/gh/malfet/417/head
2025-12-04T08:57:06.0315018Z  * [new branch]              gh/malfet/417/orig          -> origin/gh/malfet/417/orig
2025-12-04T08:57:06.0317701Z  * [new branch]              gh/malfet/506/base          -> origin/gh/malfet/506/base
2025-12-04T08:57:06.0319394Z  * [new branch]              gh/malfet/506/head          -> origin/gh/malfet/506/head
2025-12-04T08:57:06.0321098Z  * [new branch]              gh/malfet/506/orig          -> origin/gh/malfet/506/orig
2025-12-04T08:57:06.0323381Z  * [new branch]              gh/malfet/517/base          -> origin/gh/malfet/517/base
2025-12-04T08:57:06.0325022Z  * [new branch]              gh/malfet/517/head          -> origin/gh/malfet/517/head
2025-12-04T08:57:06.0327609Z  * [new branch]              gh/malfet/528/base          -> origin/gh/malfet/528/base
2025-12-04T08:57:06.0329193Z  * [new branch]              gh/malfet/528/head          -> origin/gh/malfet/528/head
2025-12-04T08:57:06.0330769Z  * [new branch]              gh/malfet/528/orig          -> origin/gh/malfet/528/orig
2025-12-04T08:57:06.0332881Z  * [new branch]              gh/malfet/537/base          -> origin/gh/malfet/537/base
2025-12-04T08:57:06.0334545Z  * [new branch]              gh/malfet/537/head          -> origin/gh/malfet/537/head
2025-12-04T08:57:06.0336282Z  * [new branch]              gh/malfet/537/orig          -> origin/gh/malfet/537/orig
2025-12-04T08:57:06.0339036Z  * [new branch]              gh/malfet/546/base          -> origin/gh/malfet/546/base
2025-12-04T08:57:06.0340406Z  * [new branch]              gh/malfet/546/head          -> origin/gh/malfet/546/head
2025-12-04T08:57:06.0341871Z  * [new branch]              gh/malfet/546/orig          -> origin/gh/malfet/546/orig
2025-12-04T08:57:06.0343925Z  * [new branch]              gh/malfet/565/base          -> origin/gh/malfet/565/base
2025-12-04T08:57:06.0345475Z  * [new branch]              gh/malfet/565/head          -> origin/gh/malfet/565/head
2025-12-04T08:57:06.0347091Z  * [new branch]              gh/malfet/565/orig          -> origin/gh/malfet/565/orig
2025-12-04T08:57:06.0349188Z  * [new branch]              gh/malfet/575/base          -> origin/gh/malfet/575/base
2025-12-04T08:57:06.0350776Z  * [new branch]              gh/malfet/575/head          -> origin/gh/malfet/575/head
2025-12-04T08:57:06.0352368Z  * [new branch]              gh/malfet/575/orig          -> origin/gh/malfet/575/orig
2025-12-04T08:57:06.0354439Z  * [new branch]              gh/malfet/580/base          -> origin/gh/malfet/580/base
2025-12-04T08:57:06.0356058Z  * [new branch]              gh/malfet/580/head          -> origin/gh/malfet/580/head
2025-12-04T08:57:06.0357636Z  * [new branch]              gh/malfet/580/orig          -> origin/gh/malfet/580/orig
2025-12-04T08:57:06.0359797Z  * [new branch]              gh/malfet/581/base          -> origin/gh/malfet/581/base
2025-12-04T08:57:06.0361525Z  * [new branch]              gh/malfet/581/head          -> origin/gh/malfet/581/head
2025-12-04T08:57:06.0363071Z  * [new branch]              gh/malfet/581/orig          -> origin/gh/malfet/581/orig
2025-12-04T08:57:06.0365079Z  * [new branch]              gh/malfet/583/base          -> origin/gh/malfet/583/base
2025-12-04T08:57:06.0366674Z  * [new branch]              gh/malfet/583/head          -> origin/gh/malfet/583/head
2025-12-04T08:57:06.0368352Z  * [new branch]              gh/malfet/583/orig          -> origin/gh/malfet/583/orig
2025-12-04T08:57:06.0370449Z  * [new branch]              gh/malfet/586/base          -> origin/gh/malfet/586/base
2025-12-04T08:57:06.0372021Z  * [new branch]              gh/malfet/586/head          -> origin/gh/malfet/586/head
2025-12-04T08:57:06.0373768Z  * [new branch]              gh/malfet/586/orig          -> origin/gh/malfet/586/orig
2025-12-04T08:57:06.0376170Z  * [new branch]              gh/malfet/587/base          -> origin/gh/malfet/587/base
2025-12-04T08:57:06.0377765Z  * [new branch]              gh/malfet/587/head          -> origin/gh/malfet/587/head
2025-12-04T08:57:06.0379321Z  * [new branch]              gh/malfet/587/orig          -> origin/gh/malfet/587/orig
2025-12-04T08:57:06.0381449Z  * [new branch]              gh/malfet/588/base          -> origin/gh/malfet/588/base
2025-12-04T08:57:06.0383033Z  * [new branch]              gh/malfet/588/head          -> origin/gh/malfet/588/head
2025-12-04T08:57:06.0384739Z  * [new branch]              gh/malfet/588/orig          -> origin/gh/malfet/588/orig
2025-12-04T08:57:06.0386859Z  * [new branch]              gh/malfet/589/base          -> origin/gh/malfet/589/base
2025-12-04T08:57:06.0388405Z  * [new branch]              gh/malfet/589/head          -> origin/gh/malfet/589/head
2025-12-04T08:57:06.0390562Z  * [new branch]              gh/malfet/589/orig          -> origin/gh/malfet/589/orig
2025-12-04T08:57:06.0392663Z  * [new branch]              gh/malfet/590/base          -> origin/gh/malfet/590/base
2025-12-04T08:57:06.0394240Z  * [new branch]              gh/malfet/590/head          -> origin/gh/malfet/590/head
2025-12-04T08:57:06.0395924Z  * [new branch]              gh/malfet/590/orig          -> origin/gh/malfet/590/orig
2025-12-04T08:57:06.0398452Z  * [new branch]              gh/malfet/591/base          -> origin/gh/malfet/591/base
2025-12-04T08:57:06.0400228Z  * [new branch]              gh/malfet/591/head          -> origin/gh/malfet/591/head
2025-12-04T08:57:06.0401882Z  * [new branch]              gh/malfet/591/orig          -> origin/gh/malfet/591/orig
2025-12-04T08:57:06.0404067Z  * [new branch]              gh/malfet/592/base          -> origin/gh/malfet/592/base
2025-12-04T08:57:06.0405638Z  * [new branch]              gh/malfet/592/head          -> origin/gh/malfet/592/head
2025-12-04T08:57:06.0407259Z  * [new branch]              gh/malfet/592/orig          -> origin/gh/malfet/592/orig
2025-12-04T08:57:06.0409405Z  * [new branch]              gh/malfet/593/base          -> origin/gh/malfet/593/base
2025-12-04T08:57:06.0411036Z  * [new branch]              gh/malfet/593/head          -> origin/gh/malfet/593/head
2025-12-04T08:57:06.0412628Z  * [new branch]              gh/malfet/593/orig          -> origin/gh/malfet/593/orig
2025-12-04T08:57:06.0414768Z  * [new branch]              gh/malfet/594/base          -> origin/gh/malfet/594/base
2025-12-04T08:57:06.0416387Z  * [new branch]              gh/malfet/594/head          -> origin/gh/malfet/594/head
2025-12-04T08:57:06.0418273Z  * [new branch]              gh/malfet/594/orig          -> origin/gh/malfet/594/orig
2025-12-04T08:57:06.0420298Z  * [new branch]              gh/malfet/595/base          -> origin/gh/malfet/595/base
2025-12-04T08:57:06.0421925Z  * [new branch]              gh/malfet/595/head          -> origin/gh/malfet/595/head
2025-12-04T08:57:06.0423580Z  * [new branch]              gh/malfet/595/orig          -> origin/gh/malfet/595/orig
2025-12-04T08:57:06.0425780Z  * [new branch]              gh/malfet/596/base          -> origin/gh/malfet/596/base
2025-12-04T08:57:06.0427290Z  * [new branch]              gh/malfet/596/head          -> origin/gh/malfet/596/head
2025-12-04T08:57:06.0428870Z  * [new branch]              gh/malfet/596/orig          -> origin/gh/malfet/596/orig
2025-12-04T08:57:06.0431036Z  * [new branch]              gh/malfet/597/base          -> origin/gh/malfet/597/base
2025-12-04T08:57:06.0432609Z  * [new branch]              gh/malfet/597/head          -> origin/gh/malfet/597/head
2025-12-04T08:57:06.0434148Z  * [new branch]              gh/malfet/597/orig          -> origin/gh/malfet/597/orig
2025-12-04T08:57:06.0436344Z  * [new branch]              gh/malfet/598/base          -> origin/gh/malfet/598/base
2025-12-04T08:57:06.0437954Z  * [new branch]              gh/malfet/598/head          -> origin/gh/malfet/598/head
2025-12-04T08:57:06.0439658Z  * [new branch]              gh/malfet/598/orig          -> origin/gh/malfet/598/orig
2025-12-04T08:57:06.0441881Z  * [new branch]              gh/malfet/599/base          -> origin/gh/malfet/599/base
2025-12-04T08:57:06.0443481Z  * [new branch]              gh/malfet/599/head          -> origin/gh/malfet/599/head
2025-12-04T08:57:06.0445095Z  * [new branch]              gh/malfet/599/orig          -> origin/gh/malfet/599/orig
2025-12-04T08:57:06.0447264Z  * [new branch]              gh/malfet/600/base          -> origin/gh/malfet/600/base
2025-12-04T08:57:06.0448845Z  * [new branch]              gh/malfet/600/head          -> origin/gh/malfet/600/head
2025-12-04T08:57:06.0450409Z  * [new branch]              gh/malfet/600/orig          -> origin/gh/malfet/600/orig
2025-12-04T08:57:06.0452534Z  * [new branch]              gh/malfet/601/base          -> origin/gh/malfet/601/base
2025-12-04T08:57:06.0454229Z  * [new branch]              gh/malfet/601/head          -> origin/gh/malfet/601/head
2025-12-04T08:57:06.0455832Z  * [new branch]              gh/malfet/601/orig          -> origin/gh/malfet/601/orig
2025-12-04T08:57:06.0458018Z  * [new branch]              gh/malfet/602/base          -> origin/gh/malfet/602/base
2025-12-04T08:57:06.0459698Z  * [new branch]              gh/malfet/602/head          -> origin/gh/malfet/602/head
2025-12-04T08:57:06.0461271Z  * [new branch]              gh/malfet/602/orig          -> origin/gh/malfet/602/orig
2025-12-04T08:57:06.0463362Z  * [new branch]              gh/malfet/603/base          -> origin/gh/malfet/603/base
2025-12-04T08:57:06.0464937Z  * [new branch]              gh/malfet/603/head          -> origin/gh/malfet/603/head
2025-12-04T08:57:06.0466520Z  * [new branch]              gh/malfet/603/orig          -> origin/gh/malfet/603/orig
2025-12-04T08:57:06.0468699Z  * [new branch]              gh/malfet/604/base          -> origin/gh/malfet/604/base
2025-12-04T08:57:06.0470234Z  * [new branch]              gh/malfet/604/head          -> origin/gh/malfet/604/head
2025-12-04T08:57:06.0471869Z  * [new branch]              gh/malfet/604/orig          -> origin/gh/malfet/604/orig
2025-12-04T08:57:06.0474114Z  * [new branch]              gh/malfet/605/base          -> origin/gh/malfet/605/base
2025-12-04T08:57:06.0475672Z  * [new branch]              gh/malfet/605/head          -> origin/gh/malfet/605/head
2025-12-04T08:57:06.0477289Z  * [new branch]              gh/malfet/605/orig          -> origin/gh/malfet/605/orig
2025-12-04T08:57:06.0479477Z  * [new branch]              gh/malfet/606/base          -> origin/gh/malfet/606/base
2025-12-04T08:57:06.0481256Z  * [new branch]              gh/malfet/606/head          -> origin/gh/malfet/606/head
2025-12-04T08:57:06.0482907Z  * [new branch]              gh/malfet/606/orig          -> origin/gh/malfet/606/orig
2025-12-04T08:57:06.0485072Z  * [new branch]              gh/malfet/607/base          -> origin/gh/malfet/607/base
2025-12-04T08:57:06.0486714Z  * [new branch]              gh/malfet/607/head          -> origin/gh/malfet/607/head
2025-12-04T08:57:06.0488349Z  * [new branch]              gh/malfet/607/orig          -> origin/gh/malfet/607/orig
2025-12-04T08:57:06.0490532Z  * [new branch]              gh/malfet/608/base          -> origin/gh/malfet/608/base
2025-12-04T08:57:06.0492088Z  * [new branch]              gh/malfet/608/head          -> origin/gh/malfet/608/head
2025-12-04T08:57:06.0493637Z  * [new branch]              gh/malfet/608/orig          -> origin/gh/malfet/608/orig
2025-12-04T08:57:06.0495908Z  * [new branch]              gh/malfet/609/base          -> origin/gh/malfet/609/base
2025-12-04T08:57:06.0497476Z  * [new branch]              gh/malfet/609/head          -> origin/gh/malfet/609/head
2025-12-04T08:57:06.0499095Z  * [new branch]              gh/malfet/609/orig          -> origin/gh/malfet/609/orig
2025-12-04T08:57:06.0501360Z  * [new branch]              gh/malfet/610/base          -> origin/gh/malfet/610/base
2025-12-04T08:57:06.0503130Z  * [new branch]              gh/malfet/610/head          -> origin/gh/malfet/610/head
2025-12-04T08:57:06.0504561Z  * [new branch]              gh/malfet/610/orig          -> origin/gh/malfet/610/orig
2025-12-04T08:57:06.0506753Z  * [new branch]              gh/malfet/611/base          -> origin/gh/malfet/611/base
2025-12-04T08:57:06.0508531Z  * [new branch]              gh/malfet/611/head          -> origin/gh/malfet/611/head
2025-12-04T08:57:06.0510100Z  * [new branch]              gh/malfet/611/orig          -> origin/gh/malfet/611/orig
2025-12-04T08:57:06.0512100Z  * [new branch]              gh/malfet/612/base          -> origin/gh/malfet/612/base
2025-12-04T08:57:06.0513688Z  * [new branch]              gh/malfet/612/head          -> origin/gh/malfet/612/head
2025-12-04T08:57:06.0515278Z  * [new branch]              gh/malfet/612/orig          -> origin/gh/malfet/612/orig
2025-12-04T08:57:06.0517678Z  * [new branch]              gh/malfet/64/base           -> origin/gh/malfet/64/base
2025-12-04T08:57:06.0519260Z  * [new branch]              gh/malfet/64/head           -> origin/gh/malfet/64/head
2025-12-04T08:57:06.0522062Z  * [new branch]              gh/manuelcandales/11/base   -> origin/gh/manuelcandales/11/base
2025-12-04T08:57:06.0523638Z  * [new branch]              gh/manuelcandales/11/head   -> origin/gh/manuelcandales/11/head
2025-12-04T08:57:06.0525178Z  * [new branch]              gh/manuelcandales/11/orig   -> origin/gh/manuelcandales/11/orig
2025-12-04T08:57:06.0527953Z  * [new branch]              gh/markkm/1/base            -> origin/gh/markkm/1/base
2025-12-04T08:57:06.0530608Z  * [new branch]              gh/masnesral/1/base         -> origin/gh/masnesral/1/base
2025-12-04T08:57:06.0532190Z  * [new branch]              gh/masnesral/1/head         -> origin/gh/masnesral/1/head
2025-12-04T08:57:06.0533797Z  * [new branch]              gh/masnesral/1/orig         -> origin/gh/masnesral/1/orig
2025-12-04T08:57:06.0536352Z  * [new branch]              gh/mhorowitz/0/base         -> origin/gh/mhorowitz/0/base
2025-12-04T08:57:06.0538000Z  * [new branch]              gh/mhorowitz/0/head         -> origin/gh/mhorowitz/0/head
2025-12-04T08:57:06.0540016Z  * [new branch]              gh/mhorowitz/1/base         -> origin/gh/mhorowitz/1/base
2025-12-04T08:57:06.0541585Z  * [new branch]              gh/mhorowitz/1/head         -> origin/gh/mhorowitz/1/head
2025-12-04T08:57:06.0543555Z  * [new branch]              gh/mhorowitz/2/base         -> origin/gh/mhorowitz/2/base
2025-12-04T08:57:06.0545169Z  * [new branch]              gh/mhorowitz/2/head         -> origin/gh/mhorowitz/2/head
2025-12-04T08:57:06.0547213Z  * [new branch]              gh/mhorowitz/3/base         -> origin/gh/mhorowitz/3/base
2025-12-04T08:57:06.0548809Z  * [new branch]              gh/mhorowitz/3/head         -> origin/gh/mhorowitz/3/head
2025-12-04T08:57:06.0550753Z  * [new branch]              gh/mhorowitz/4/base         -> origin/gh/mhorowitz/4/base
2025-12-04T08:57:06.0552766Z  * [new branch]              gh/mhorowitz/4/head         -> origin/gh/mhorowitz/4/head
2025-12-04T08:57:06.0554761Z  * [new branch]              gh/mhorowitz/5/base         -> origin/gh/mhorowitz/5/base
2025-12-04T08:57:06.0556283Z  * [new branch]              gh/mhorowitz/5/head         -> origin/gh/mhorowitz/5/head
2025-12-04T08:57:06.0558273Z  * [new branch]              gh/mhorowitz/6/base         -> origin/gh/mhorowitz/6/base
2025-12-04T08:57:06.0559779Z  * [new branch]              gh/mhorowitz/6/head         -> origin/gh/mhorowitz/6/head
2025-12-04T08:57:06.0562677Z  * [new branch]              gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base
2025-12-04T08:57:06.0564242Z  * [new branch]              gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head
2025-12-04T08:57:06.0566340Z  * [new branch]              gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base
2025-12-04T08:57:06.0567882Z  * [new branch]              gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head
2025-12-04T08:57:06.0570122Z  * [new branch]              gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base
2025-12-04T08:57:06.0571540Z  * [new branch]              gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head
2025-12-04T08:57:06.0573577Z  * [new branch]              gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base
2025-12-04T08:57:06.0575080Z  * [new branch]              gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head
2025-12-04T08:57:06.0577200Z  * [new branch]              gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base
2025-12-04T08:57:06.0578735Z  * [new branch]              gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head
2025-12-04T08:57:06.0580885Z  * [new branch]              gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base
2025-12-04T08:57:06.0582532Z  * [new branch]              gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head
2025-12-04T08:57:06.0584144Z  * [new branch]              gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig
2025-12-04T08:57:06.0586421Z  * [new branch]              gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base
2025-12-04T08:57:06.0587940Z  * [new branch]              gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head
2025-12-04T08:57:06.0589603Z  * [new branch]              gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig
2025-12-04T08:57:06.0591863Z  * [new branch]              gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base
2025-12-04T08:57:06.0593364Z  * [new branch]              gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head
2025-12-04T08:57:06.0594983Z  * [new branch]              gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig
2025-12-04T08:57:06.0597202Z  * [new branch]              gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base
2025-12-04T08:57:06.0598791Z  * [new branch]              gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head
2025-12-04T08:57:06.0600493Z  * [new branch]              gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig
2025-12-04T08:57:06.0602721Z  * [new branch]              gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base
2025-12-04T08:57:06.0604291Z  * [new branch]              gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head
2025-12-04T08:57:06.0605928Z  * [new branch]              gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig
2025-12-04T08:57:06.0608116Z  * [new branch]              gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base
2025-12-04T08:57:06.0609703Z  * [new branch]              gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head
2025-12-04T08:57:06.0611223Z  * [new branch]              gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig
2025-12-04T08:57:06.0613435Z  * [new branch]              gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base
2025-12-04T08:57:06.0615012Z  * [new branch]              gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head
2025-12-04T08:57:06.0616568Z  * [new branch]              gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig
2025-12-04T08:57:06.0619318Z  * [new branch]              gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base
2025-12-04T08:57:06.0620888Z  * [new branch]              gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head
2025-12-04T08:57:06.0622514Z  * [new branch]              gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig
2025-12-04T08:57:06.0624767Z  * [new branch]              gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base
2025-12-04T08:57:06.0626461Z  * [new branch]              gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head
2025-12-04T08:57:06.0628151Z  * [new branch]              gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig
2025-12-04T08:57:06.0630720Z  * [new branch]              gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base
2025-12-04T08:57:06.0632723Z  * [new branch]              gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head
2025-12-04T08:57:06.0634333Z  * [new branch]              gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig
2025-12-04T08:57:06.0636289Z  * [new branch]              gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base
2025-12-04T08:57:06.0637881Z  * [new branch]              gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head
2025-12-04T08:57:06.0639491Z  * [new branch]              gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig
2025-12-04T08:57:06.0642330Z  * [new branch]              gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base
2025-12-04T08:57:06.0643908Z  * [new branch]              gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head
2025-12-04T08:57:06.0645506Z  * [new branch]              gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig
2025-12-04T08:57:06.0647567Z  * [new branch]              gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base
2025-12-04T08:57:06.0649185Z  * [new branch]              gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head
2025-12-04T08:57:06.0650657Z  * [new branch]              gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig
2025-12-04T08:57:06.0653050Z  * [new branch]              gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base
2025-12-04T08:57:06.0654704Z  * [new branch]              gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head
2025-12-04T08:57:06.0656333Z  * [new branch]              gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig
2025-12-04T08:57:06.0658477Z  * [new branch]              gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base
2025-12-04T08:57:06.0660128Z  * [new branch]              gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head
2025-12-04T08:57:06.0661768Z  * [new branch]              gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig
2025-12-04T08:57:06.0664053Z  * [new branch]              gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base
2025-12-04T08:57:06.0665640Z  * [new branch]              gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head
2025-12-04T08:57:06.0667246Z  * [new branch]              gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig
2025-12-04T08:57:06.0669511Z  * [new branch]              gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base
2025-12-04T08:57:06.0671205Z  * [new branch]              gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head
2025-12-04T08:57:06.0673014Z  * [new branch]              gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig
2025-12-04T08:57:06.0675418Z  * [new branch]              gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base
2025-12-04T08:57:06.0677297Z  * [new branch]              gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head
2025-12-04T08:57:06.0678873Z  * [new branch]              gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig
2025-12-04T08:57:06.0681634Z  * [new branch]              gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base
2025-12-04T08:57:06.0683173Z  * [new branch]              gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head
2025-12-04T08:57:06.0684799Z  * [new branch]              gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig
2025-12-04T08:57:06.0687112Z  * [new branch]              gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base
2025-12-04T08:57:06.0688696Z  * [new branch]              gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head
2025-12-04T08:57:06.0690471Z  * [new branch]              gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig
2025-12-04T08:57:06.0692581Z  * [new branch]              gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base
2025-12-04T08:57:06.0694022Z  * [new branch]              gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head
2025-12-04T08:57:06.0695641Z  * [new branch]              gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig
2025-12-04T08:57:06.0697891Z  * [new branch]              gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base
2025-12-04T08:57:06.0699594Z  * [new branch]              gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head
2025-12-04T08:57:06.0701282Z  * [new branch]              gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig
2025-12-04T08:57:06.0703605Z  * [new branch]              gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base
2025-12-04T08:57:06.0705171Z  * [new branch]              gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head
2025-12-04T08:57:06.0706862Z  * [new branch]              gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig
2025-12-04T08:57:06.0709070Z  * [new branch]              gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base
2025-12-04T08:57:06.0710666Z  * [new branch]              gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head
2025-12-04T08:57:06.0712212Z  * [new branch]              gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig
2025-12-04T08:57:06.0714483Z  * [new branch]              gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base
2025-12-04T08:57:06.0716089Z  * [new branch]              gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head
2025-12-04T08:57:06.0717703Z  * [new branch]              gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig
2025-12-04T08:57:06.0722248Z  * [new branch]              gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base
2025-12-04T08:57:06.0723702Z  * [new branch]              gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head
2025-12-04T08:57:06.0725299Z  * [new branch]              gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig
2025-12-04T08:57:06.0727526Z  * [new branch]              gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base
2025-12-04T08:57:06.0729367Z  * [new branch]              gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head
2025-12-04T08:57:06.0730960Z  * [new branch]              gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig
2025-12-04T08:57:06.0733038Z  * [new branch]              gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base
2025-12-04T08:57:06.0734616Z  * [new branch]              gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head
2025-12-04T08:57:06.0736216Z  * [new branch]              gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig
2025-12-04T08:57:06.0738451Z  * [new branch]              gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base
2025-12-04T08:57:06.0739960Z  * [new branch]              gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head
2025-12-04T08:57:06.0741534Z  * [new branch]              gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig
2025-12-04T08:57:06.0743806Z  * [new branch]              gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base
2025-12-04T08:57:06.0745439Z  * [new branch]              gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head
2025-12-04T08:57:06.0747038Z  * [new branch]              gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig
2025-12-04T08:57:06.0749665Z  * [new branch]              gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base
2025-12-04T08:57:06.0751921Z  * [new branch]              gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head
2025-12-04T08:57:06.0753744Z  * [new branch]              gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig
2025-12-04T08:57:06.0755763Z  * [new branch]              gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base
2025-12-04T08:57:06.0757356Z  * [new branch]              gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head
2025-12-04T08:57:06.0758991Z  * [new branch]              gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig
2025-12-04T08:57:06.0761332Z  * [new branch]              gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base
2025-12-04T08:57:06.0762920Z  * [new branch]              gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head
2025-12-04T08:57:06.0764592Z  * [new branch]              gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig
2025-12-04T08:57:06.0766918Z  * [new branch]              gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base
2025-12-04T08:57:06.0768458Z  * [new branch]              gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head
2025-12-04T08:57:06.0770035Z  * [new branch]              gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig
2025-12-04T08:57:06.0772086Z  * [new branch]              gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base
2025-12-04T08:57:06.0773692Z  * [new branch]              gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head
2025-12-04T08:57:06.0775245Z  * [new branch]              gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig
2025-12-04T08:57:06.0777294Z  * [new branch]              gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base
2025-12-04T08:57:06.0778891Z  * [new branch]              gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head
2025-12-04T08:57:06.0780393Z  * [new branch]              gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig
2025-12-04T08:57:06.0782550Z  * [new branch]              gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base
2025-12-04T08:57:06.0784115Z  * [new branch]              gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head
2025-12-04T08:57:06.0785647Z  * [new branch]              gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig
2025-12-04T08:57:06.0788006Z  * [new branch]              gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base
2025-12-04T08:57:06.0789597Z  * [new branch]              gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head
2025-12-04T08:57:06.0791221Z  * [new branch]              gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig
2025-12-04T08:57:06.0793950Z  * [new branch]              gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base
2025-12-04T08:57:06.0795515Z  * [new branch]              gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head
2025-12-04T08:57:06.0797092Z  * [new branch]              gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig
2025-12-04T08:57:06.0799751Z  * [new branch]              gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base
2025-12-04T08:57:06.0801528Z  * [new branch]              gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head
2025-12-04T08:57:06.0803261Z  * [new branch]              gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig
2025-12-04T08:57:06.0805525Z  * [new branch]              gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base
2025-12-04T08:57:06.0807019Z  * [new branch]              gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head
2025-12-04T08:57:06.0808590Z  * [new branch]              gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig
2025-12-04T08:57:06.0810792Z  * [new branch]              gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base
2025-12-04T08:57:06.0812484Z  * [new branch]              gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head
2025-12-04T08:57:06.0814037Z  * [new branch]              gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig
2025-12-04T08:57:06.0816118Z  * [new branch]              gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base
2025-12-04T08:57:06.0817878Z  * [new branch]              gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head
2025-12-04T08:57:06.0819537Z  * [new branch]              gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig
2025-12-04T08:57:06.0821793Z  * [new branch]              gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base
2025-12-04T08:57:06.0823376Z  * [new branch]              gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head
2025-12-04T08:57:06.0824895Z  * [new branch]              gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig
2025-12-04T08:57:06.0827150Z  * [new branch]              gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base
2025-12-04T08:57:06.0828658Z  * [new branch]              gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head
2025-12-04T08:57:06.0830229Z  * [new branch]              gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig
2025-12-04T08:57:06.0832567Z  * [new branch]              gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base
2025-12-04T08:57:06.0834327Z  * [new branch]              gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head
2025-12-04T08:57:06.0835949Z  * [new branch]              gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig
2025-12-04T08:57:06.0838237Z  * [new branch]              gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base
2025-12-04T08:57:06.0839830Z  * [new branch]              gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head
2025-12-04T08:57:06.0841575Z  * [new branch]              gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig
2025-12-04T08:57:06.0844091Z  * [new branch]              gh/mlazos/41/base           -> origin/gh/mlazos/41/base
2025-12-04T08:57:06.0845648Z  * [new branch]              gh/mlazos/41/head           -> origin/gh/mlazos/41/head
2025-12-04T08:57:06.0847200Z  * [new branch]              gh/mlazos/41/orig           -> origin/gh/mlazos/41/orig
2025-12-04T08:57:06.0849416Z  * [new branch]              gh/mlazos/42/base           -> origin/gh/mlazos/42/base
2025-12-04T08:57:06.0850948Z  * [new branch]              gh/mlazos/42/head           -> origin/gh/mlazos/42/head
2025-12-04T08:57:06.0852584Z  * [new branch]              gh/mlazos/42/orig           -> origin/gh/mlazos/42/orig
2025-12-04T08:57:06.0854553Z  * [new branch]              gh/mlazos/43/base           -> origin/gh/mlazos/43/base
2025-12-04T08:57:06.0856151Z  * [new branch]              gh/mlazos/43/head           -> origin/gh/mlazos/43/head
2025-12-04T08:57:06.0857799Z  * [new branch]              gh/mlazos/43/orig           -> origin/gh/mlazos/43/orig
2025-12-04T08:57:06.0859788Z  * [new branch]              gh/mlazos/44/base           -> origin/gh/mlazos/44/base
2025-12-04T08:57:06.0861326Z  * [new branch]              gh/mlazos/44/head           -> origin/gh/mlazos/44/head
2025-12-04T08:57:06.0862901Z  * [new branch]              gh/mlazos/44/orig           -> origin/gh/mlazos/44/orig
2025-12-04T08:57:06.0864958Z  * [new branch]              gh/mlazos/47/base           -> origin/gh/mlazos/47/base
2025-12-04T08:57:06.0866557Z  * [new branch]              gh/mlazos/47/head           -> origin/gh/mlazos/47/head
2025-12-04T08:57:06.0868119Z  * [new branch]              gh/mlazos/47/orig           -> origin/gh/mlazos/47/orig
2025-12-04T08:57:06.0870124Z  * [new branch]              gh/mlazos/48/base           -> origin/gh/mlazos/48/base
2025-12-04T08:57:06.0872214Z  * [new branch]              gh/mlazos/48/head           -> origin/gh/mlazos/48/head
2025-12-04T08:57:06.0874041Z  * [new branch]              gh/mlazos/48/orig           -> origin/gh/mlazos/48/orig
2025-12-04T08:57:06.0876021Z  * [new branch]              gh/mlazos/49/base           -> origin/gh/mlazos/49/base
2025-12-04T08:57:06.0877757Z  * [new branch]              gh/mlazos/49/head           -> origin/gh/mlazos/49/head
2025-12-04T08:57:06.0879582Z  * [new branch]              gh/mlazos/49/orig           -> origin/gh/mlazos/49/orig
2025-12-04T08:57:06.0882000Z  * [new branch]              gh/mlazos/50/base           -> origin/gh/mlazos/50/base
2025-12-04T08:57:06.0883061Z  * [new branch]              gh/mlazos/50/head           -> origin/gh/mlazos/50/head
2025-12-04T08:57:06.0884958Z  * [new branch]              gh/mlazos/50/orig           -> origin/gh/mlazos/50/orig
2025-12-04T08:57:06.0887018Z  * [new branch]              gh/mlazos/51/base           -> origin/gh/mlazos/51/base
2025-12-04T08:57:06.0888576Z  * [new branch]              gh/mlazos/51/head           -> origin/gh/mlazos/51/head
2025-12-04T08:57:06.0890217Z  * [new branch]              gh/mlazos/51/orig           -> origin/gh/mlazos/51/orig
2025-12-04T08:57:06.0892401Z  * [new branch]              gh/mlazos/52/base           -> origin/gh/mlazos/52/base
2025-12-04T08:57:06.0893983Z  * [new branch]              gh/mlazos/52/head           -> origin/gh/mlazos/52/head
2025-12-04T08:57:06.0895785Z  * [new branch]              gh/mlazos/52/orig           -> origin/gh/mlazos/52/orig
2025-12-04T08:57:06.0898009Z  * [new branch]              gh/mlazos/53/base           -> origin/gh/mlazos/53/base
2025-12-04T08:57:06.0899597Z  * [new branch]              gh/mlazos/53/head           -> origin/gh/mlazos/53/head
2025-12-04T08:57:06.0901228Z  * [new branch]              gh/mlazos/53/orig           -> origin/gh/mlazos/53/orig
2025-12-04T08:57:06.0903232Z  * [new branch]              gh/mlazos/54/base           -> origin/gh/mlazos/54/base
2025-12-04T08:57:06.0904780Z  * [new branch]              gh/mlazos/54/head           -> origin/gh/mlazos/54/head
2025-12-04T08:57:06.0906341Z  * [new branch]              gh/mlazos/54/orig           -> origin/gh/mlazos/54/orig
2025-12-04T08:57:06.0908430Z  * [new branch]              gh/mlazos/55/base           -> origin/gh/mlazos/55/base
2025-12-04T08:57:06.0909972Z  * [new branch]              gh/mlazos/55/head           -> origin/gh/mlazos/55/head
2025-12-04T08:57:06.0911561Z  * [new branch]              gh/mlazos/55/orig           -> origin/gh/mlazos/55/orig
2025-12-04T08:57:06.0913731Z  * [new branch]              gh/mlazos/56/base           -> origin/gh/mlazos/56/base
2025-12-04T08:57:06.0915352Z  * [new branch]              gh/mlazos/56/head           -> origin/gh/mlazos/56/head
2025-12-04T08:57:06.0917130Z  * [new branch]              gh/mlazos/56/orig           -> origin/gh/mlazos/56/orig
2025-12-04T08:57:06.0919301Z  * [new branch]              gh/mlazos/57/base           -> origin/gh/mlazos/57/base
2025-12-04T08:57:06.0921006Z  * [new branch]              gh/mlazos/57/head           -> origin/gh/mlazos/57/head
2025-12-04T08:57:06.0922516Z  * [new branch]              gh/mlazos/57/orig           -> origin/gh/mlazos/57/orig
2025-12-04T08:57:06.0924978Z  * [new branch]              gh/mlazos/58/base           -> origin/gh/mlazos/58/base
2025-12-04T08:57:06.0926635Z  * [new branch]              gh/mlazos/58/head           -> origin/gh/mlazos/58/head
2025-12-04T08:57:06.0928248Z  * [new branch]              gh/mlazos/58/orig           -> origin/gh/mlazos/58/orig
2025-12-04T08:57:06.0930506Z  * [new branch]              gh/mlazos/59/base           -> origin/gh/mlazos/59/base
2025-12-04T08:57:06.0932041Z  * [new branch]              gh/mlazos/59/head           -> origin/gh/mlazos/59/head
2025-12-04T08:57:06.0933520Z  * [new branch]              gh/mlazos/59/orig           -> origin/gh/mlazos/59/orig
2025-12-04T08:57:06.0935670Z  * [new branch]              gh/mlazos/60/base           -> origin/gh/mlazos/60/base
2025-12-04T08:57:06.0937242Z  * [new branch]              gh/mlazos/60/head           -> origin/gh/mlazos/60/head
2025-12-04T08:57:06.0939227Z  * [new branch]              gh/mlazos/60/orig           -> origin/gh/mlazos/60/orig
2025-12-04T08:57:06.0941593Z  * [new branch]              gh/mlazos/61/base           -> origin/gh/mlazos/61/base
2025-12-04T08:57:06.0943230Z  * [new branch]              gh/mlazos/61/head           -> origin/gh/mlazos/61/head
2025-12-04T08:57:06.0945895Z  * [new branch]              gh/mlazos/61/orig           -> origin/gh/mlazos/61/orig
2025-12-04T08:57:06.0947713Z  * [new branch]              gh/mlazos/62/base           -> origin/gh/mlazos/62/base
2025-12-04T08:57:06.0948681Z  * [new branch]              gh/mlazos/62/head           -> origin/gh/mlazos/62/head
2025-12-04T08:57:06.0950382Z  * [new branch]              gh/mlazos/62/orig           -> origin/gh/mlazos/62/orig
2025-12-04T08:57:06.0952617Z  * [new branch]              gh/mlazos/63/base           -> origin/gh/mlazos/63/base
2025-12-04T08:57:06.0954256Z  * [new branch]              gh/mlazos/63/head           -> origin/gh/mlazos/63/head
2025-12-04T08:57:06.0955896Z  * [new branch]              gh/mlazos/63/orig           -> origin/gh/mlazos/63/orig
2025-12-04T08:57:06.0958052Z  * [new branch]              gh/mlazos/64/base           -> origin/gh/mlazos/64/base
2025-12-04T08:57:06.0959529Z  * [new branch]              gh/mlazos/64/head           -> origin/gh/mlazos/64/head
2025-12-04T08:57:06.0961407Z  * [new branch]              gh/mlazos/64/orig           -> origin/gh/mlazos/64/orig
2025-12-04T08:57:06.0963972Z  * [new branch]              gh/mlazos/65/base           -> origin/gh/mlazos/65/base
2025-12-04T08:57:06.0965588Z  * [new branch]              gh/mlazos/65/head           -> origin/gh/mlazos/65/head
2025-12-04T08:57:06.0967117Z  * [new branch]              gh/mlazos/65/orig           -> origin/gh/mlazos/65/orig
2025-12-04T08:57:06.0969275Z  * [new branch]              gh/mlazos/66/base           -> origin/gh/mlazos/66/base
2025-12-04T08:57:06.0970948Z  * [new branch]              gh/mlazos/66/head           -> origin/gh/mlazos/66/head
2025-12-04T08:57:06.0972505Z  * [new branch]              gh/mlazos/66/orig           -> origin/gh/mlazos/66/orig
2025-12-04T08:57:06.0974655Z  * [new branch]              gh/mlazos/67/base           -> origin/gh/mlazos/67/base
2025-12-04T08:57:06.0976217Z  * [new branch]              gh/mlazos/67/head           -> origin/gh/mlazos/67/head
2025-12-04T08:57:06.0977817Z  * [new branch]              gh/mlazos/67/orig           -> origin/gh/mlazos/67/orig
2025-12-04T08:57:06.0979935Z  * [new branch]              gh/mlazos/68/base           -> origin/gh/mlazos/68/base
2025-12-04T08:57:06.0981540Z  * [new branch]              gh/mlazos/68/head           -> origin/gh/mlazos/68/head
2025-12-04T08:57:06.0983167Z  * [new branch]              gh/mlazos/68/orig           -> origin/gh/mlazos/68/orig
2025-12-04T08:57:06.0985401Z  * [new branch]              gh/mlazos/69/base           -> origin/gh/mlazos/69/base
2025-12-04T08:57:06.0986998Z  * [new branch]              gh/mlazos/69/head           -> origin/gh/mlazos/69/head
2025-12-04T08:57:06.0988561Z  * [new branch]              gh/mlazos/69/orig           -> origin/gh/mlazos/69/orig
2025-12-04T08:57:06.0990699Z  * [new branch]              gh/mlazos/70/base           -> origin/gh/mlazos/70/base
2025-12-04T08:57:06.0992393Z  * [new branch]              gh/mlazos/70/head           -> origin/gh/mlazos/70/head
2025-12-04T08:57:06.0993949Z  * [new branch]              gh/mlazos/70/orig           -> origin/gh/mlazos/70/orig
2025-12-04T08:57:06.0996137Z  * [new branch]              gh/mlazos/71/base           -> origin/gh/mlazos/71/base
2025-12-04T08:57:06.0997715Z  * [new branch]              gh/mlazos/71/head           -> origin/gh/mlazos/71/head
2025-12-04T08:57:06.0999332Z  * [new branch]              gh/mlazos/71/orig           -> origin/gh/mlazos/71/orig
2025-12-04T08:57:06.1001779Z  * [new branch]              gh/mlazos/72/base           -> origin/gh/mlazos/72/base
2025-12-04T08:57:06.1003329Z  * [new branch]              gh/mlazos/72/head           -> origin/gh/mlazos/72/head
2025-12-04T08:57:06.1005072Z  * [new branch]              gh/mlazos/72/orig           -> origin/gh/mlazos/72/orig
2025-12-04T08:57:06.1007175Z  * [new branch]              gh/mlazos/73/base           -> origin/gh/mlazos/73/base
2025-12-04T08:57:06.1008814Z  * [new branch]              gh/mlazos/73/head           -> origin/gh/mlazos/73/head
2025-12-04T08:57:06.1010245Z  * [new branch]              gh/mlazos/73/orig           -> origin/gh/mlazos/73/orig
2025-12-04T08:57:06.1012905Z  * [new branch]              gh/mrmiywj/1/base           -> origin/gh/mrmiywj/1/base
2025-12-04T08:57:06.1014583Z  * [new branch]              gh/mrmiywj/1/head           -> origin/gh/mrmiywj/1/head
2025-12-04T08:57:06.1017334Z  * [new branch]              gh/muchulee8/73/base        -> origin/gh/muchulee8/73/base
2025-12-04T08:57:06.1020769Z  * [new branch]              gh/muchulee8/73/head        -> origin/gh/muchulee8/73/head
2025-12-04T08:57:06.1022374Z  * [new branch]              gh/muchulee8/73/orig        -> origin/gh/muchulee8/73/orig
2025-12-04T08:57:06.1025086Z  * [new branch]              gh/naveenthangudu/1/base    -> origin/gh/naveenthangudu/1/base
2025-12-04T08:57:06.1026683Z  * [new branch]              gh/naveenthangudu/1/head    -> origin/gh/naveenthangudu/1/head
2025-12-04T08:57:06.1028388Z  * [new branch]              gh/naveenthangudu/1/orig    -> origin/gh/naveenthangudu/1/orig
2025-12-04T08:57:06.1030498Z  * [new branch]              gh/naveenthangudu/2/base    -> origin/gh/naveenthangudu/2/base
2025-12-04T08:57:06.1032089Z  * [new branch]              gh/naveenthangudu/2/head    -> origin/gh/naveenthangudu/2/head
2025-12-04T08:57:06.1033686Z  * [new branch]              gh/naveenthangudu/2/orig    -> origin/gh/naveenthangudu/2/orig
2025-12-04T08:57:06.1035726Z  * [new branch]              gh/naveenthangudu/3/base    -> origin/gh/naveenthangudu/3/base
2025-12-04T08:57:06.1037325Z  * [new branch]              gh/naveenthangudu/3/head    -> origin/gh/naveenthangudu/3/head
2025-12-04T08:57:06.1039034Z  * [new branch]              gh/naveenthangudu/3/orig    -> origin/gh/naveenthangudu/3/orig
2025-12-04T08:57:06.1041364Z  * [new branch]              gh/naveenthangudu/4/base    -> origin/gh/naveenthangudu/4/base
2025-12-04T08:57:06.1042915Z  * [new branch]              gh/naveenthangudu/4/head    -> origin/gh/naveenthangudu/4/head
2025-12-04T08:57:06.1044531Z  * [new branch]              gh/naveenthangudu/4/orig    -> origin/gh/naveenthangudu/4/orig
2025-12-04T08:57:06.1046802Z  * [new branch]              gh/naveenthangudu/5/base    -> origin/gh/naveenthangudu/5/base
2025-12-04T08:57:06.1048502Z  * [new branch]              gh/naveenthangudu/5/head    -> origin/gh/naveenthangudu/5/head
2025-12-04T08:57:06.1050237Z  * [new branch]              gh/naveenthangudu/5/orig    -> origin/gh/naveenthangudu/5/orig
2025-12-04T08:57:06.1052342Z  * [new branch]              gh/naveenthangudu/6/base    -> origin/gh/naveenthangudu/6/base
2025-12-04T08:57:06.1054035Z  * [new branch]              gh/naveenthangudu/6/head    -> origin/gh/naveenthangudu/6/head
2025-12-04T08:57:06.1055517Z  * [new branch]              gh/naveenthangudu/6/orig    -> origin/gh/naveenthangudu/6/orig
2025-12-04T08:57:06.1057596Z  * [new branch]              gh/naveenthangudu/7/base    -> origin/gh/naveenthangudu/7/base
2025-12-04T08:57:06.1059214Z  * [new branch]              gh/naveenthangudu/7/head    -> origin/gh/naveenthangudu/7/head
2025-12-04T08:57:06.1060690Z  * [new branch]              gh/naveenthangudu/7/orig    -> origin/gh/naveenthangudu/7/orig
2025-12-04T08:57:06.1062749Z  * [new branch]              gh/naveenthangudu/8/base    -> origin/gh/naveenthangudu/8/base
2025-12-04T08:57:06.1064320Z  * [new branch]              gh/naveenthangudu/8/head    -> origin/gh/naveenthangudu/8/head
2025-12-04T08:57:06.1065998Z  * [new branch]              gh/naveenthangudu/8/orig    -> origin/gh/naveenthangudu/8/orig
2025-12-04T08:57:06.1068199Z  * [new branch]              gh/naveenthangudu/9/base    -> origin/gh/naveenthangudu/9/base
2025-12-04T08:57:06.1069772Z  * [new branch]              gh/naveenthangudu/9/head    -> origin/gh/naveenthangudu/9/head
2025-12-04T08:57:06.1071608Z  * [new branch]              gh/naveenthangudu/9/orig    -> origin/gh/naveenthangudu/9/orig
2025-12-04T08:57:06.1073933Z  * [new branch]              gh/nikitaved/1/base         -> origin/gh/nikitaved/1/base
2025-12-04T08:57:06.1075486Z  * [new branch]              gh/nikitaved/1/head         -> origin/gh/nikitaved/1/head
2025-12-04T08:57:06.1077051Z  * [new branch]              gh/nikitaved/1/orig         -> origin/gh/nikitaved/1/orig
2025-12-04T08:57:06.1079207Z  * [new branch]              gh/nikitaved/10/base        -> origin/gh/nikitaved/10/base
2025-12-04T08:57:06.1081391Z  * [new branch]              gh/nikitaved/10/head        -> origin/gh/nikitaved/10/head
2025-12-04T08:57:06.1082878Z  * [new branch]              gh/nikitaved/10/orig        -> origin/gh/nikitaved/10/orig
2025-12-04T08:57:06.1084895Z  * [new branch]              gh/nikitaved/11/base        -> origin/gh/nikitaved/11/base
2025-12-04T08:57:06.1086526Z  * [new branch]              gh/nikitaved/11/head        -> origin/gh/nikitaved/11/head
2025-12-04T08:57:06.1088084Z  * [new branch]              gh/nikitaved/11/orig        -> origin/gh/nikitaved/11/orig
2025-12-04T08:57:06.1090153Z  * [new branch]              gh/nikitaved/12/base        -> origin/gh/nikitaved/12/base
2025-12-04T08:57:06.1091736Z  * [new branch]              gh/nikitaved/12/head        -> origin/gh/nikitaved/12/head
2025-12-04T08:57:06.1093279Z  * [new branch]              gh/nikitaved/12/orig        -> origin/gh/nikitaved/12/orig
2025-12-04T08:57:06.1095410Z  * [new branch]              gh/nikitaved/13/base        -> origin/gh/nikitaved/13/base
2025-12-04T08:57:06.1097140Z  * [new branch]              gh/nikitaved/13/head        -> origin/gh/nikitaved/13/head
2025-12-04T08:57:06.1098706Z  * [new branch]              gh/nikitaved/13/orig        -> origin/gh/nikitaved/13/orig
2025-12-04T08:57:06.1101324Z  * [new branch]              gh/nikitaved/14/base        -> origin/gh/nikitaved/14/base
2025-12-04T08:57:06.1102959Z  * [new branch]              gh/nikitaved/14/head        -> origin/gh/nikitaved/14/head
2025-12-04T08:57:06.1104543Z  * [new branch]              gh/nikitaved/14/orig        -> origin/gh/nikitaved/14/orig
2025-12-04T08:57:06.1106566Z  * [new branch]              gh/nikitaved/15/base        -> origin/gh/nikitaved/15/base
2025-12-04T08:57:06.1108153Z  * [new branch]              gh/nikitaved/15/head        -> origin/gh/nikitaved/15/head
2025-12-04T08:57:06.1109773Z  * [new branch]              gh/nikitaved/15/orig        -> origin/gh/nikitaved/15/orig
2025-12-04T08:57:06.1111933Z  * [new branch]              gh/nikitaved/16/base        -> origin/gh/nikitaved/16/base
2025-12-04T08:57:06.1113524Z  * [new branch]              gh/nikitaved/16/head        -> origin/gh/nikitaved/16/head
2025-12-04T08:57:06.1115131Z  * [new branch]              gh/nikitaved/16/orig        -> origin/gh/nikitaved/16/orig
2025-12-04T08:57:06.1117518Z  * [new branch]              gh/nikitaved/2/base         -> origin/gh/nikitaved/2/base
2025-12-04T08:57:06.1119255Z  * [new branch]              gh/nikitaved/2/head         -> origin/gh/nikitaved/2/head
2025-12-04T08:57:06.1120943Z  * [new branch]              gh/nikitaved/2/orig         -> origin/gh/nikitaved/2/orig
2025-12-04T08:57:06.1122987Z  * [new branch]              gh/nikitaved/4/base         -> origin/gh/nikitaved/4/base
2025-12-04T08:57:06.1124550Z  * [new branch]              gh/nikitaved/4/head         -> origin/gh/nikitaved/4/head
2025-12-04T08:57:06.1126150Z  * [new branch]              gh/nikitaved/4/orig         -> origin/gh/nikitaved/4/orig
2025-12-04T08:57:06.1128322Z  * [new branch]              gh/nikitaved/5/base         -> origin/gh/nikitaved/5/base
2025-12-04T08:57:06.1129915Z  * [new branch]              gh/nikitaved/5/head         -> origin/gh/nikitaved/5/head
2025-12-04T08:57:06.1131465Z  * [new branch]              gh/nikitaved/5/orig         -> origin/gh/nikitaved/5/orig
2025-12-04T08:57:06.1133623Z  * [new branch]              gh/nikitaved/6/base         -> origin/gh/nikitaved/6/base
2025-12-04T08:57:06.1135372Z  * [new branch]              gh/nikitaved/6/head         -> origin/gh/nikitaved/6/head
2025-12-04T08:57:06.1136803Z  * [new branch]              gh/nikitaved/6/orig         -> origin/gh/nikitaved/6/orig
2025-12-04T08:57:06.1138959Z  * [new branch]              gh/nikitaved/8/base         -> origin/gh/nikitaved/8/base
2025-12-04T08:57:06.1140493Z  * [new branch]              gh/nikitaved/8/head         -> origin/gh/nikitaved/8/head
2025-12-04T08:57:06.1142086Z  * [new branch]              gh/nikitaved/8/orig         -> origin/gh/nikitaved/8/orig
2025-12-04T08:57:06.1144174Z  * [new branch]              gh/nikitaved/9/base         -> origin/gh/nikitaved/9/base
2025-12-04T08:57:06.1145781Z  * [new branch]              gh/nikitaved/9/head         -> origin/gh/nikitaved/9/head
2025-12-04T08:57:06.1147308Z  * [new branch]              gh/nikitaved/9/orig         -> origin/gh/nikitaved/9/orig
2025-12-04T08:57:06.1149933Z  * [new branch]              gh/oulgen/10/base           -> origin/gh/oulgen/10/base
2025-12-04T08:57:06.1151589Z  * [new branch]              gh/oulgen/10/head           -> origin/gh/oulgen/10/head
2025-12-04T08:57:06.1153652Z  * [new branch]              gh/oulgen/10/orig           -> origin/gh/oulgen/10/orig
2025-12-04T08:57:06.1155761Z  * [new branch]              gh/oulgen/11/base           -> origin/gh/oulgen/11/base
2025-12-04T08:57:06.1157343Z  * [new branch]              gh/oulgen/11/head           -> origin/gh/oulgen/11/head
2025-12-04T08:57:06.1158900Z  * [new branch]              gh/oulgen/11/orig           -> origin/gh/oulgen/11/orig
2025-12-04T08:57:06.1161433Z  * [new branch]              gh/oulgen/12/base           -> origin/gh/oulgen/12/base
2025-12-04T08:57:06.1162896Z  * [new branch]              gh/oulgen/12/head           -> origin/gh/oulgen/12/head
2025-12-04T08:57:06.1164463Z  * [new branch]              gh/oulgen/12/orig           -> origin/gh/oulgen/12/orig
2025-12-04T08:57:06.1166565Z  * [new branch]              gh/oulgen/13/base           -> origin/gh/oulgen/13/base
2025-12-04T08:57:06.1168170Z  * [new branch]              gh/oulgen/13/head           -> origin/gh/oulgen/13/head
2025-12-04T08:57:06.1169603Z  * [new branch]              gh/oulgen/13/orig           -> origin/gh/oulgen/13/orig
2025-12-04T08:57:06.1171678Z  * [new branch]              gh/oulgen/14/base           -> origin/gh/oulgen/14/base
2025-12-04T08:57:06.1173262Z  * [new branch]              gh/oulgen/14/head           -> origin/gh/oulgen/14/head
2025-12-04T08:57:06.1174963Z  * [new branch]              gh/oulgen/14/orig           -> origin/gh/oulgen/14/orig
2025-12-04T08:57:06.1177052Z  * [new branch]              gh/oulgen/15/base           -> origin/gh/oulgen/15/base
2025-12-04T08:57:06.1179008Z  * [new branch]              gh/oulgen/15/head           -> origin/gh/oulgen/15/head
2025-12-04T08:57:06.1180240Z  * [new branch]              gh/oulgen/15/orig           -> origin/gh/oulgen/15/orig
2025-12-04T08:57:06.1182673Z  * [new branch]              gh/oulgen/16/base           -> origin/gh/oulgen/16/base
2025-12-04T08:57:06.1183864Z  * [new branch]              gh/oulgen/16/head           -> origin/gh/oulgen/16/head
2025-12-04T08:57:06.1185739Z  * [new branch]              gh/oulgen/16/orig           -> origin/gh/oulgen/16/orig
2025-12-04T08:57:06.1187813Z  * [new branch]              gh/oulgen/17/base           -> origin/gh/oulgen/17/base
2025-12-04T08:57:06.1189257Z  * [new branch]              gh/oulgen/17/head           -> origin/gh/oulgen/17/head
2025-12-04T08:57:06.1190909Z  * [new branch]              gh/oulgen/17/orig           -> origin/gh/oulgen/17/orig
2025-12-04T08:57:06.1193018Z  * [new branch]              gh/oulgen/18/base           -> origin/gh/oulgen/18/base
2025-12-04T08:57:06.1194648Z  * [new branch]              gh/oulgen/18/head           -> origin/gh/oulgen/18/head
2025-12-04T08:57:06.1196369Z  * [new branch]              gh/oulgen/18/orig           -> origin/gh/oulgen/18/orig
2025-12-04T08:57:06.1198675Z  * [new branch]              gh/oulgen/19/base           -> origin/gh/oulgen/19/base
2025-12-04T08:57:06.1200692Z  * [new branch]              gh/oulgen/19/head           -> origin/gh/oulgen/19/head
2025-12-04T08:57:06.1202141Z  * [new branch]              gh/oulgen/19/orig           -> origin/gh/oulgen/19/orig
2025-12-04T08:57:06.1204169Z  * [new branch]              gh/oulgen/20/base           -> origin/gh/oulgen/20/base
2025-12-04T08:57:06.1205705Z  * [new branch]              gh/oulgen/20/head           -> origin/gh/oulgen/20/head
2025-12-04T08:57:06.1207299Z  * [new branch]              gh/oulgen/20/orig           -> origin/gh/oulgen/20/orig
2025-12-04T08:57:06.1209326Z  * [new branch]              gh/oulgen/21/base           -> origin/gh/oulgen/21/base
2025-12-04T08:57:06.1210988Z  * [new branch]              gh/oulgen/21/head           -> origin/gh/oulgen/21/head
2025-12-04T08:57:06.1212522Z  * [new branch]              gh/oulgen/21/orig           -> origin/gh/oulgen/21/orig
2025-12-04T08:57:06.1215057Z  * [new branch]              gh/oulgen/22/base           -> origin/gh/oulgen/22/base
2025-12-04T08:57:06.1216645Z  * [new branch]              gh/oulgen/22/head           -> origin/gh/oulgen/22/head
2025-12-04T08:57:06.1218587Z  * [new branch]              gh/oulgen/22/orig           -> origin/gh/oulgen/22/orig
2025-12-04T08:57:06.1220566Z  * [new branch]              gh/oulgen/23/base           -> origin/gh/oulgen/23/base
2025-12-04T08:57:06.1222176Z  * [new branch]              gh/oulgen/23/head           -> origin/gh/oulgen/23/head
2025-12-04T08:57:06.1223708Z  * [new branch]              gh/oulgen/23/orig           -> origin/gh/oulgen/23/orig
2025-12-04T08:57:06.1225737Z  * [new branch]              gh/oulgen/24/base           -> origin/gh/oulgen/24/base
2025-12-04T08:57:06.1227309Z  * [new branch]              gh/oulgen/24/head           -> origin/gh/oulgen/24/head
2025-12-04T08:57:06.1228909Z  * [new branch]              gh/oulgen/24/orig           -> origin/gh/oulgen/24/orig
2025-12-04T08:57:06.1230961Z  * [new branch]              gh/oulgen/25/base           -> origin/gh/oulgen/25/base
2025-12-04T08:57:06.1232508Z  * [new branch]              gh/oulgen/25/head           -> origin/gh/oulgen/25/head
2025-12-04T08:57:06.1234118Z  * [new branch]              gh/oulgen/25/orig           -> origin/gh/oulgen/25/orig
2025-12-04T08:57:06.1236340Z  * [new branch]              gh/oulgen/26/base           -> origin/gh/oulgen/26/base
2025-12-04T08:57:06.1237834Z  * [new branch]              gh/oulgen/26/head           -> origin/gh/oulgen/26/head
2025-12-04T08:57:06.1239505Z  * [new branch]              gh/oulgen/26/orig           -> origin/gh/oulgen/26/orig
2025-12-04T08:57:06.1241860Z  * [new branch]              gh/oulgen/4/base            -> origin/gh/oulgen/4/base
2025-12-04T08:57:06.1243412Z  * [new branch]              gh/oulgen/4/head            -> origin/gh/oulgen/4/head
2025-12-04T08:57:06.1244912Z  * [new branch]              gh/oulgen/4/orig            -> origin/gh/oulgen/4/orig
2025-12-04T08:57:06.1247474Z  * [new branch]              gh/oulgen/7/base            -> origin/gh/oulgen/7/base
2025-12-04T08:57:06.1249051Z  * [new branch]              gh/oulgen/7/head            -> origin/gh/oulgen/7/head
2025-12-04T08:57:06.1251131Z  * [new branch]              gh/oulgen/7/orig            -> origin/gh/oulgen/7/orig
2025-12-04T08:57:06.1253698Z  * [new branch]              gh/oulgen/8/base            -> origin/gh/oulgen/8/base
2025-12-04T08:57:06.1255327Z  * [new branch]              gh/oulgen/8/head            -> origin/gh/oulgen/8/head
2025-12-04T08:57:06.1256885Z  * [new branch]              gh/oulgen/8/orig            -> origin/gh/oulgen/8/orig
2025-12-04T08:57:06.1259154Z  * [new branch]              gh/oulgen/9/base            -> origin/gh/oulgen/9/base
2025-12-04T08:57:06.1260762Z  * [new branch]              gh/oulgen/9/head            -> origin/gh/oulgen/9/head
2025-12-04T08:57:06.1262444Z  * [new branch]              gh/oulgen/9/orig            -> origin/gh/oulgen/9/orig
2025-12-04T08:57:06.1265273Z  * [new branch]              gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization
2025-12-04T08:57:06.1268192Z  * [new branch]              gh/pearu/108/base           -> origin/gh/pearu/108/base
2025-12-04T08:57:06.1269973Z  * [new branch]              gh/pearu/108/head           -> origin/gh/pearu/108/head
2025-12-04T08:57:06.1271578Z  * [new branch]              gh/pearu/108/orig           -> origin/gh/pearu/108/orig
2025-12-04T08:57:06.1273734Z  * [new branch]              gh/pearu/109/base           -> origin/gh/pearu/109/base
2025-12-04T08:57:06.1275395Z  * [new branch]              gh/pearu/109/head           -> origin/gh/pearu/109/head
2025-12-04T08:57:06.1276967Z  * [new branch]              gh/pearu/109/orig           -> origin/gh/pearu/109/orig
2025-12-04T08:57:06.1279030Z  * [new branch]              gh/pearu/110/base           -> origin/gh/pearu/110/base
2025-12-04T08:57:06.1280832Z  * [new branch]              gh/pearu/110/head           -> origin/gh/pearu/110/head
2025-12-04T08:57:06.1282432Z  * [new branch]              gh/pearu/110/orig           -> origin/gh/pearu/110/orig
2025-12-04T08:57:06.1284578Z  * [new branch]              gh/pearu/111/base           -> origin/gh/pearu/111/base
2025-12-04T08:57:06.1286172Z  * [new branch]              gh/pearu/111/head           -> origin/gh/pearu/111/head
2025-12-04T08:57:06.1287774Z  * [new branch]              gh/pearu/111/orig           -> origin/gh/pearu/111/orig
2025-12-04T08:57:06.1289910Z  * [new branch]              gh/pearu/112/base           -> origin/gh/pearu/112/base
2025-12-04T08:57:06.1291690Z  * [new branch]              gh/pearu/112/head           -> origin/gh/pearu/112/head
2025-12-04T08:57:06.1293269Z  * [new branch]              gh/pearu/112/orig           -> origin/gh/pearu/112/orig
2025-12-04T08:57:06.1295334Z  * [new branch]              gh/pearu/115/base           -> origin/gh/pearu/115/base
2025-12-04T08:57:06.1296908Z  * [new branch]              gh/pearu/115/head           -> origin/gh/pearu/115/head
2025-12-04T08:57:06.1298497Z  * [new branch]              gh/pearu/115/orig           -> origin/gh/pearu/115/orig
2025-12-04T08:57:06.1300491Z  * [new branch]              gh/pearu/116/base           -> origin/gh/pearu/116/base
2025-12-04T08:57:06.1302045Z  * [new branch]              gh/pearu/116/head           -> origin/gh/pearu/116/head
2025-12-04T08:57:06.1303642Z  * [new branch]              gh/pearu/116/orig           -> origin/gh/pearu/116/orig
2025-12-04T08:57:06.1305837Z  * [new branch]              gh/pearu/117/base           -> origin/gh/pearu/117/base
2025-12-04T08:57:06.1307411Z  * [new branch]              gh/pearu/117/head           -> origin/gh/pearu/117/head
2025-12-04T08:57:06.1309005Z  * [new branch]              gh/pearu/117/orig           -> origin/gh/pearu/117/orig
2025-12-04T08:57:06.1311188Z  * [new branch]              gh/pearu/118/base           -> origin/gh/pearu/118/base
2025-12-04T08:57:06.1312760Z  * [new branch]              gh/pearu/118/head           -> origin/gh/pearu/118/head
2025-12-04T08:57:06.1314333Z  * [new branch]              gh/pearu/118/orig           -> origin/gh/pearu/118/orig
2025-12-04T08:57:06.1316450Z  * [new branch]              gh/pearu/119/base           -> origin/gh/pearu/119/base
2025-12-04T08:57:06.1318221Z  * [new branch]              gh/pearu/119/head           -> origin/gh/pearu/119/head
2025-12-04T08:57:06.1319788Z  * [new branch]              gh/pearu/119/orig           -> origin/gh/pearu/119/orig
2025-12-04T08:57:06.1322015Z  * [new branch]              gh/pearu/139/base           -> origin/gh/pearu/139/base
2025-12-04T08:57:06.1323600Z  * [new branch]              gh/pearu/139/head           -> origin/gh/pearu/139/head
2025-12-04T08:57:06.1325161Z  * [new branch]              gh/pearu/139/orig           -> origin/gh/pearu/139/orig
2025-12-04T08:57:06.1327361Z  * [new branch]              gh/pearu/140/base           -> origin/gh/pearu/140/base
2025-12-04T08:57:06.1328932Z  * [new branch]              gh/pearu/140/head           -> origin/gh/pearu/140/head
2025-12-04T08:57:06.1330689Z  * [new branch]              gh/pearu/140/orig           -> origin/gh/pearu/140/orig
2025-12-04T08:57:06.1332671Z  * [new branch]              gh/pearu/142/base           -> origin/gh/pearu/142/base
2025-12-04T08:57:06.1334206Z  * [new branch]              gh/pearu/142/head           -> origin/gh/pearu/142/head
2025-12-04T08:57:06.1335762Z  * [new branch]              gh/pearu/142/orig           -> origin/gh/pearu/142/orig
2025-12-04T08:57:06.1337899Z  * [new branch]              gh/pearu/143/base           -> origin/gh/pearu/143/base
2025-12-04T08:57:06.1339448Z  * [new branch]              gh/pearu/143/head           -> origin/gh/pearu/143/head
2025-12-04T08:57:06.1341062Z  * [new branch]              gh/pearu/143/orig           -> origin/gh/pearu/143/orig
2025-12-04T08:57:06.1343220Z  * [new branch]              gh/pearu/147/base           -> origin/gh/pearu/147/base
2025-12-04T08:57:06.1344818Z  * [new branch]              gh/pearu/147/head           -> origin/gh/pearu/147/head
2025-12-04T08:57:06.1346382Z  * [new branch]              gh/pearu/147/orig           -> origin/gh/pearu/147/orig
2025-12-04T08:57:06.1348619Z  * [new branch]              gh/pearu/149/base           -> origin/gh/pearu/149/base
2025-12-04T08:57:06.1350166Z  * [new branch]              gh/pearu/149/head           -> origin/gh/pearu/149/head
2025-12-04T08:57:06.1351745Z  * [new branch]              gh/pearu/149/orig           -> origin/gh/pearu/149/orig
2025-12-04T08:57:06.1354366Z  * [new branch]              gh/pearu/150/base           -> origin/gh/pearu/150/base
2025-12-04T08:57:06.1355966Z  * [new branch]              gh/pearu/150/head           -> origin/gh/pearu/150/head
2025-12-04T08:57:06.1357572Z  * [new branch]              gh/pearu/150/orig           -> origin/gh/pearu/150/orig
2025-12-04T08:57:06.1359857Z  * [new branch]              gh/pearu/151/base           -> origin/gh/pearu/151/base
2025-12-04T08:57:06.1361707Z  * [new branch]              gh/pearu/151/head           -> origin/gh/pearu/151/head
2025-12-04T08:57:06.1363843Z  * [new branch]              gh/pearu/151/orig           -> origin/gh/pearu/151/orig
2025-12-04T08:57:06.1366763Z  * [new branch]              gh/pearu/152/base           -> origin/gh/pearu/152/base
2025-12-04T08:57:06.1368342Z  * [new branch]              gh/pearu/152/head           -> origin/gh/pearu/152/head
2025-12-04T08:57:06.1369836Z  * [new branch]              gh/pearu/152/orig           -> origin/gh/pearu/152/orig
2025-12-04T08:57:06.1372078Z  * [new branch]              gh/pearu/153/base           -> origin/gh/pearu/153/base
2025-12-04T08:57:06.1373717Z  * [new branch]              gh/pearu/153/head           -> origin/gh/pearu/153/head
2025-12-04T08:57:06.1375807Z  * [new branch]              gh/pearu/153/orig           -> origin/gh/pearu/153/orig
2025-12-04T08:57:06.1377983Z  * [new branch]              gh/pearu/154/base           -> origin/gh/pearu/154/base
2025-12-04T08:57:06.1379613Z  * [new branch]              gh/pearu/154/head           -> origin/gh/pearu/154/head
2025-12-04T08:57:06.1381646Z  * [new branch]              gh/pearu/154/orig           -> origin/gh/pearu/154/orig
2025-12-04T08:57:06.1383851Z  * [new branch]              gh/pearu/155/base           -> origin/gh/pearu/155/base
2025-12-04T08:57:06.1385428Z  * [new branch]              gh/pearu/155/head           -> origin/gh/pearu/155/head
2025-12-04T08:57:06.1386993Z  * [new branch]              gh/pearu/155/orig           -> origin/gh/pearu/155/orig
2025-12-04T08:57:06.1389158Z  * [new branch]              gh/pearu/156/base           -> origin/gh/pearu/156/base
2025-12-04T08:57:06.1390698Z  * [new branch]              gh/pearu/156/head           -> origin/gh/pearu/156/head
2025-12-04T08:57:06.1392239Z  * [new branch]              gh/pearu/156/orig           -> origin/gh/pearu/156/orig
2025-12-04T08:57:06.1394900Z  * [new branch]              gh/pearu/56/base            -> origin/gh/pearu/56/base
2025-12-04T08:57:06.1396922Z  * [new branch]              gh/pearu/56/head            -> origin/gh/pearu/56/head
2025-12-04T08:57:06.1398671Z  * [new branch]              gh/pearu/56/orig            -> origin/gh/pearu/56/orig
2025-12-04T08:57:06.1401154Z  * [new branch]              gh/pearu/97/base            -> origin/gh/pearu/97/base
2025-12-04T08:57:06.1402792Z  * [new branch]              gh/pearu/97/head            -> origin/gh/pearu/97/head
2025-12-04T08:57:06.1404491Z  * [new branch]              gh/pearu/97/orig            -> origin/gh/pearu/97/orig
2025-12-04T08:57:06.1407008Z  * [new branch]              gh/pianpwk/21/base          -> origin/gh/pianpwk/21/base
2025-12-04T08:57:06.1408560Z  * [new branch]              gh/pianpwk/21/head          -> origin/gh/pianpwk/21/head
2025-12-04T08:57:06.1410666Z  * [new branch]              gh/pianpwk/28/base          -> origin/gh/pianpwk/28/base
2025-12-04T08:57:06.1412243Z  * [new branch]              gh/pianpwk/28/head          -> origin/gh/pianpwk/28/head
2025-12-04T08:57:06.1413809Z  * [new branch]              gh/pianpwk/28/orig          -> origin/gh/pianpwk/28/orig
2025-12-04T08:57:06.1416103Z  * [new branch]              gh/pianpwk/29/base          -> origin/gh/pianpwk/29/base
2025-12-04T08:57:06.1420025Z  * [new branch]              gh/pianpwk/29/head          -> origin/gh/pianpwk/29/head
2025-12-04T08:57:06.1421666Z  * [new branch]              gh/pianpwk/29/orig          -> origin/gh/pianpwk/29/orig
2025-12-04T08:57:06.1424318Z  * [new branch]              gh/pianpwk/30/base          -> origin/gh/pianpwk/30/base
2025-12-04T08:57:06.1425840Z  * [new branch]              gh/pianpwk/30/head          -> origin/gh/pianpwk/30/head
2025-12-04T08:57:06.1427479Z  * [new branch]              gh/pianpwk/30/orig          -> origin/gh/pianpwk/30/orig
2025-12-04T08:57:06.1429733Z  * [new branch]              gh/pianpwk/31/base          -> origin/gh/pianpwk/31/base
2025-12-04T08:57:06.1431335Z  * [new branch]              gh/pianpwk/31/head          -> origin/gh/pianpwk/31/head
2025-12-04T08:57:06.1432914Z  * [new branch]              gh/pianpwk/31/orig          -> origin/gh/pianpwk/31/orig
2025-12-04T08:57:06.1434878Z  * [new branch]              gh/pianpwk/32/base          -> origin/gh/pianpwk/32/base
2025-12-04T08:57:06.1436470Z  * [new branch]              gh/pianpwk/32/head          -> origin/gh/pianpwk/32/head
2025-12-04T08:57:06.1437994Z  * [new branch]              gh/pianpwk/32/orig          -> origin/gh/pianpwk/32/orig
2025-12-04T08:57:06.1440010Z  * [new branch]              gh/pianpwk/33/base          -> origin/gh/pianpwk/33/base
2025-12-04T08:57:06.1441677Z  * [new branch]              gh/pianpwk/33/head          -> origin/gh/pianpwk/33/head
2025-12-04T08:57:06.1443273Z  * [new branch]              gh/pianpwk/33/orig          -> origin/gh/pianpwk/33/orig
2025-12-04T08:57:06.1445624Z  * [new branch]              gh/pianpwk/34/base          -> origin/gh/pianpwk/34/base
2025-12-04T08:57:06.1447427Z  * [new branch]              gh/pianpwk/34/head          -> origin/gh/pianpwk/34/head
2025-12-04T08:57:06.1449269Z  * [new branch]              gh/pianpwk/34/orig          -> origin/gh/pianpwk/34/orig
2025-12-04T08:57:06.1451540Z  * [new branch]              gh/pianpwk/35/base          -> origin/gh/pianpwk/35/base
2025-12-04T08:57:06.1453151Z  * [new branch]              gh/pianpwk/35/head          -> origin/gh/pianpwk/35/head
2025-12-04T08:57:06.1454787Z  * [new branch]              gh/pianpwk/35/orig          -> origin/gh/pianpwk/35/orig
2025-12-04T08:57:06.1457340Z  * [new branch]              gh/rec/141/base             -> origin/gh/rec/141/base
2025-12-04T08:57:06.1458951Z  * [new branch]              gh/rec/141/head             -> origin/gh/rec/141/head
2025-12-04T08:57:06.1461159Z  * [new branch]              gh/rec/153/base             -> origin/gh/rec/153/base
2025-12-04T08:57:06.1462747Z  * [new branch]              gh/rec/153/head             -> origin/gh/rec/153/head
2025-12-04T08:57:06.1464324Z  * [new branch]              gh/rec/153/orig             -> origin/gh/rec/153/orig
2025-12-04T08:57:06.1466408Z  * [new branch]              gh/rec/154/base             -> origin/gh/rec/154/base
2025-12-04T08:57:06.1468272Z  * [new branch]              gh/rec/154/head             -> origin/gh/rec/154/head
2025-12-04T08:57:06.1469646Z  * [new branch]              gh/rec/154/orig             -> origin/gh/rec/154/orig
2025-12-04T08:57:06.1471709Z  * [new branch]              gh/rec/164/base             -> origin/gh/rec/164/base
2025-12-04T08:57:06.1473296Z  * [new branch]              gh/rec/164/head             -> origin/gh/rec/164/head
2025-12-04T08:57:06.1474873Z  * [new branch]              gh/rec/164/orig             -> origin/gh/rec/164/orig
2025-12-04T08:57:06.1476957Z  * [new branch]              gh/rec/166/base             -> origin/gh/rec/166/base
2025-12-04T08:57:06.1478494Z  * [new branch]              gh/rec/166/head             -> origin/gh/rec/166/head
2025-12-04T08:57:06.1480171Z  * [new branch]              gh/rec/166/orig             -> origin/gh/rec/166/orig
2025-12-04T08:57:06.1482418Z  * [new branch]              gh/rec/167/base             -> origin/gh/rec/167/base
2025-12-04T08:57:06.1483990Z  * [new branch]              gh/rec/167/head             -> origin/gh/rec/167/head
2025-12-04T08:57:06.1485552Z  * [new branch]              gh/rec/167/orig             -> origin/gh/rec/167/orig
2025-12-04T08:57:06.1487675Z  * [new branch]              gh/rec/168/base             -> origin/gh/rec/168/base
2025-12-04T08:57:06.1489336Z  * [new branch]              gh/rec/168/head             -> origin/gh/rec/168/head
2025-12-04T08:57:06.1490992Z  * [new branch]              gh/rec/168/orig             -> origin/gh/rec/168/orig
2025-12-04T08:57:06.1493085Z  * [new branch]              gh/rec/169/base             -> origin/gh/rec/169/base
2025-12-04T08:57:06.1494654Z  * [new branch]              gh/rec/169/head             -> origin/gh/rec/169/head
2025-12-04T08:57:06.1496266Z  * [new branch]              gh/rec/169/orig             -> origin/gh/rec/169/orig
2025-12-04T08:57:06.1498829Z  * [new branch]              gh/rec/170/base             -> origin/gh/rec/170/base
2025-12-04T08:57:06.1500279Z  * [new branch]              gh/rec/170/head             -> origin/gh/rec/170/head
2025-12-04T08:57:06.1501867Z  * [new branch]              gh/rec/170/orig             -> origin/gh/rec/170/orig
2025-12-04T08:57:06.1504029Z  * [new branch]              gh/rec/171/base             -> origin/gh/rec/171/base
2025-12-04T08:57:06.1505650Z  * [new branch]              gh/rec/171/head             -> origin/gh/rec/171/head
2025-12-04T08:57:06.1507297Z  * [new branch]              gh/rec/171/orig             -> origin/gh/rec/171/orig
2025-12-04T08:57:06.1509362Z  * [new branch]              gh/rec/172/base             -> origin/gh/rec/172/base
2025-12-04T08:57:06.1511044Z  * [new branch]              gh/rec/172/head             -> origin/gh/rec/172/head
2025-12-04T08:57:06.1512531Z  * [new branch]              gh/rec/172/orig             -> origin/gh/rec/172/orig
2025-12-04T08:57:06.1514611Z  * [new branch]              gh/rec/173/base             -> origin/gh/rec/173/base
2025-12-04T08:57:06.1516251Z  * [new branch]              gh/rec/173/head             -> origin/gh/rec/173/head
2025-12-04T08:57:06.1517986Z  * [new branch]              gh/rec/173/orig             -> origin/gh/rec/173/orig
2025-12-04T08:57:06.1520164Z  * [new branch]              gh/rec/174/base             -> origin/gh/rec/174/base
2025-12-04T08:57:06.1522235Z  * [new branch]              gh/rec/174/head             -> origin/gh/rec/174/head
2025-12-04T08:57:06.1523780Z  * [new branch]              gh/rec/174/orig             -> origin/gh/rec/174/orig
2025-12-04T08:57:06.1525945Z  * [new branch]              gh/rec/175/base             -> origin/gh/rec/175/base
2025-12-04T08:57:06.1527509Z  * [new branch]              gh/rec/175/head             -> origin/gh/rec/175/head
2025-12-04T08:57:06.1529190Z  * [new branch]              gh/rec/175/orig             -> origin/gh/rec/175/orig
2025-12-04T08:57:06.1531222Z  * [new branch]              gh/rec/176/base             -> origin/gh/rec/176/base
2025-12-04T08:57:06.1532828Z  * [new branch]              gh/rec/176/head             -> origin/gh/rec/176/head
2025-12-04T08:57:06.1534625Z  * [new branch]              gh/rec/176/orig             -> origin/gh/rec/176/orig
2025-12-04T08:57:06.1536553Z  * [new branch]              gh/rec/177/base             -> origin/gh/rec/177/base
2025-12-04T08:57:06.1538084Z  * [new branch]              gh/rec/177/head             -> origin/gh/rec/177/head
2025-12-04T08:57:06.1539557Z  * [new branch]              gh/rec/177/orig             -> origin/gh/rec/177/orig
2025-12-04T08:57:06.1542146Z  * [new branch]              gh/robert-hardwick/3/base   -> origin/gh/robert-hardwick/3/base
2025-12-04T08:57:06.1543790Z  * [new branch]              gh/robert-hardwick/3/head   -> origin/gh/robert-hardwick/3/head
2025-12-04T08:57:06.1545443Z  * [new branch]              gh/robert-hardwick/3/orig   -> origin/gh/robert-hardwick/3/orig
2025-12-04T08:57:06.1547535Z  * [new branch]              gh/robert-hardwick/4/base   -> origin/gh/robert-hardwick/4/base
2025-12-04T08:57:06.1549161Z  * [new branch]              gh/robert-hardwick/4/head   -> origin/gh/robert-hardwick/4/head
2025-12-04T08:57:06.1551180Z  * [new branch]              gh/robert-hardwick/4/orig   -> origin/gh/robert-hardwick/4/orig
2025-12-04T08:57:06.1553227Z  * [new branch]              gh/robert-hardwick/5/base   -> origin/gh/robert-hardwick/5/base
2025-12-04T08:57:06.1554804Z  * [new branch]              gh/robert-hardwick/5/head   -> origin/gh/robert-hardwick/5/head
2025-12-04T08:57:06.1556406Z  * [new branch]              gh/robert-hardwick/5/orig   -> origin/gh/robert-hardwick/5/orig
2025-12-04T08:57:06.1558574Z  * [new branch]              gh/robert-hardwick/6/base   -> origin/gh/robert-hardwick/6/base
2025-12-04T08:57:06.1560202Z  * [new branch]              gh/robert-hardwick/6/head   -> origin/gh/robert-hardwick/6/head
2025-12-04T08:57:06.1561866Z  * [new branch]              gh/robert-hardwick/6/orig   -> origin/gh/robert-hardwick/6/orig
2025-12-04T08:57:06.1563971Z  * [new branch]              gh/robert-hardwick/7/base   -> origin/gh/robert-hardwick/7/base
2025-12-04T08:57:06.1565559Z  * [new branch]              gh/robert-hardwick/7/head   -> origin/gh/robert-hardwick/7/head
2025-12-04T08:57:06.1567210Z  * [new branch]              gh/robert-hardwick/7/orig   -> origin/gh/robert-hardwick/7/orig
2025-12-04T08:57:06.1569326Z  * [new branch]              gh/robert-hardwick/8/base   -> origin/gh/robert-hardwick/8/base
2025-12-04T08:57:06.1571016Z  * [new branch]              gh/robert-hardwick/8/head   -> origin/gh/robert-hardwick/8/head
2025-12-04T08:57:06.1572635Z  * [new branch]              gh/robert-hardwick/8/orig   -> origin/gh/robert-hardwick/8/orig
2025-12-04T08:57:06.1574709Z  * [new branch]              gh/robert-hardwick/9/base   -> origin/gh/robert-hardwick/9/base
2025-12-04T08:57:06.1576272Z  * [new branch]              gh/robert-hardwick/9/head   -> origin/gh/robert-hardwick/9/head
2025-12-04T08:57:06.1577852Z  * [new branch]              gh/robert-hardwick/9/orig   -> origin/gh/robert-hardwick/9/orig
2025-12-04T08:57:06.1580920Z  * [new branch]              gh/rtimpe/1/base            -> origin/gh/rtimpe/1/base
2025-12-04T08:57:06.1582464Z  * [new branch]              gh/rtimpe/1/head            -> origin/gh/rtimpe/1/head
2025-12-04T08:57:06.1584540Z  * [new branch]              gh/rtimpe/2/base            -> origin/gh/rtimpe/2/base
2025-12-04T08:57:06.1586125Z  * [new branch]              gh/rtimpe/2/head            -> origin/gh/rtimpe/2/head
2025-12-04T08:57:06.1588307Z  * [new branch]              gh/rtimpe/22/base           -> origin/gh/rtimpe/22/base
2025-12-04T08:57:06.1589831Z  * [new branch]              gh/rtimpe/22/head           -> origin/gh/rtimpe/22/head
2025-12-04T08:57:06.1591524Z  * [new branch]              gh/rtimpe/22/orig           -> origin/gh/rtimpe/22/orig
2025-12-04T08:57:06.1593554Z  * [new branch]              gh/rtimpe/23/base           -> origin/gh/rtimpe/23/base
2025-12-04T08:57:06.1595121Z  * [new branch]              gh/rtimpe/23/head           -> origin/gh/rtimpe/23/head
2025-12-04T08:57:06.1596818Z  * [new branch]              gh/rtimpe/23/orig           -> origin/gh/rtimpe/23/orig
2025-12-04T08:57:06.1598785Z  * [new branch]              gh/rtimpe/24/base           -> origin/gh/rtimpe/24/base
2025-12-04T08:57:06.1600361Z  * [new branch]              gh/rtimpe/24/head           -> origin/gh/rtimpe/24/head
2025-12-04T08:57:06.1602018Z  * [new branch]              gh/rtimpe/24/orig           -> origin/gh/rtimpe/24/orig
2025-12-04T08:57:06.1604072Z  * [new branch]              gh/rtimpe/25/base           -> origin/gh/rtimpe/25/base
2025-12-04T08:57:06.1605663Z  * [new branch]              gh/rtimpe/25/head           -> origin/gh/rtimpe/25/head
2025-12-04T08:57:06.1607701Z  * [new branch]              gh/rtimpe/25/orig           -> origin/gh/rtimpe/25/orig
2025-12-04T08:57:06.1610092Z  * [new branch]              gh/rtimpe/26/base           -> origin/gh/rtimpe/26/base
2025-12-04T08:57:06.1611748Z  * [new branch]              gh/rtimpe/26/head           -> origin/gh/rtimpe/26/head
2025-12-04T08:57:06.1613310Z  * [new branch]              gh/rtimpe/26/orig           -> origin/gh/rtimpe/26/orig
2025-12-04T08:57:06.1615404Z  * [new branch]              gh/rtimpe/27/base           -> origin/gh/rtimpe/27/base
2025-12-04T08:57:06.1616929Z  * [new branch]              gh/rtimpe/27/head           -> origin/gh/rtimpe/27/head
2025-12-04T08:57:06.1618728Z  * [new branch]              gh/rtimpe/27/orig           -> origin/gh/rtimpe/27/orig
2025-12-04T08:57:06.1620645Z  * [new branch]              gh/rtimpe/28/base           -> origin/gh/rtimpe/28/base
2025-12-04T08:57:06.1622406Z  * [new branch]              gh/rtimpe/28/head           -> origin/gh/rtimpe/28/head
2025-12-04T08:57:06.1623966Z  * [new branch]              gh/rtimpe/28/orig           -> origin/gh/rtimpe/28/orig
2025-12-04T08:57:06.1626176Z  * [new branch]              gh/rtimpe/29/base           -> origin/gh/rtimpe/29/base
2025-12-04T08:57:06.1627715Z  * [new branch]              gh/rtimpe/29/head           -> origin/gh/rtimpe/29/head
2025-12-04T08:57:06.1629310Z  * [new branch]              gh/rtimpe/29/orig           -> origin/gh/rtimpe/29/orig
2025-12-04T08:57:06.1631761Z  * [new branch]              gh/rtimpe/3/base            -> origin/gh/rtimpe/3/base
2025-12-04T08:57:06.1633290Z  * [new branch]              gh/rtimpe/3/head            -> origin/gh/rtimpe/3/head
2025-12-04T08:57:06.1635376Z  * [new branch]              gh/rtimpe/30/base           -> origin/gh/rtimpe/30/base
2025-12-04T08:57:06.1636956Z  * [new branch]              gh/rtimpe/30/head           -> origin/gh/rtimpe/30/head
2025-12-04T08:57:06.1638541Z  * [new branch]              gh/rtimpe/30/orig           -> origin/gh/rtimpe/30/orig
2025-12-04T08:57:06.1640697Z  * [new branch]              gh/rtimpe/31/base           -> origin/gh/rtimpe/31/base
2025-12-04T08:57:06.1642248Z  * [new branch]              gh/rtimpe/31/head           -> origin/gh/rtimpe/31/head
2025-12-04T08:57:06.1643985Z  * [new branch]              gh/rtimpe/31/orig           -> origin/gh/rtimpe/31/orig
2025-12-04T08:57:06.1646033Z  * [new branch]              gh/rtimpe/32/base           -> origin/gh/rtimpe/32/base
2025-12-04T08:57:06.1647701Z  * [new branch]              gh/rtimpe/32/head           -> origin/gh/rtimpe/32/head
2025-12-04T08:57:06.1649288Z  * [new branch]              gh/rtimpe/32/orig           -> origin/gh/rtimpe/32/orig
2025-12-04T08:57:06.1651594Z  * [new branch]              gh/rtimpe/33/base           -> origin/gh/rtimpe/33/base
2025-12-04T08:57:06.1653586Z  * [new branch]              gh/rtimpe/33/head           -> origin/gh/rtimpe/33/head
2025-12-04T08:57:06.1655214Z  * [new branch]              gh/rtimpe/33/orig           -> origin/gh/rtimpe/33/orig
2025-12-04T08:57:06.1657255Z  * [new branch]              gh/rtimpe/34/base           -> origin/gh/rtimpe/34/base
2025-12-04T08:57:06.1658833Z  * [new branch]              gh/rtimpe/34/head           -> origin/gh/rtimpe/34/head
2025-12-04T08:57:06.1660411Z  * [new branch]              gh/rtimpe/34/orig           -> origin/gh/rtimpe/34/orig
2025-12-04T08:57:06.1662749Z  * [new branch]              gh/rtimpe/35/base           -> origin/gh/rtimpe/35/base
2025-12-04T08:57:06.1664162Z  * [new branch]              gh/rtimpe/35/head           -> origin/gh/rtimpe/35/head
2025-12-04T08:57:06.1665764Z  * [new branch]              gh/rtimpe/35/orig           -> origin/gh/rtimpe/35/orig
2025-12-04T08:57:06.1667820Z  * [new branch]              gh/rtimpe/4/base            -> origin/gh/rtimpe/4/base
2025-12-04T08:57:06.1669432Z  * [new branch]              gh/rtimpe/4/head            -> origin/gh/rtimpe/4/head
2025-12-04T08:57:06.1672628Z  * [new branch]              gh/ruisizhang123/1/base     -> origin/gh/ruisizhang123/1/base
2025-12-04T08:57:06.1674242Z  * [new branch]              gh/ruisizhang123/1/head     -> origin/gh/ruisizhang123/1/head
2025-12-04T08:57:06.1675825Z  * [new branch]              gh/ruisizhang123/1/orig     -> origin/gh/ruisizhang123/1/orig
2025-12-04T08:57:06.1677935Z  * [new branch]              gh/ruisizhang123/4/base     -> origin/gh/ruisizhang123/4/base
2025-12-04T08:57:06.1679623Z  * [new branch]              gh/ruisizhang123/4/head     -> origin/gh/ruisizhang123/4/head
2025-12-04T08:57:06.1681449Z  * [new branch]              gh/ruisizhang123/4/orig     -> origin/gh/ruisizhang123/4/orig
2025-12-04T08:57:06.1683554Z  * [new branch]              gh/ruisizhang123/5/base     -> origin/gh/ruisizhang123/5/base
2025-12-04T08:57:06.1685100Z  * [new branch]              gh/ruisizhang123/5/head     -> origin/gh/ruisizhang123/5/head
2025-12-04T08:57:06.1687149Z  * [new branch]              gh/ruisizhang123/5/orig     -> origin/gh/ruisizhang123/5/orig
2025-12-04T08:57:06.1689139Z  * [new branch]              gh/ruisizhang123/6/base     -> origin/gh/ruisizhang123/6/base
2025-12-04T08:57:06.1690857Z  * [new branch]              gh/ruisizhang123/6/head     -> origin/gh/ruisizhang123/6/head
2025-12-04T08:57:06.1692432Z  * [new branch]              gh/ruisizhang123/6/orig     -> origin/gh/ruisizhang123/6/orig
2025-12-04T08:57:06.1695251Z  * [new branch]              gh/ruisizhang123/7/base     -> origin/gh/ruisizhang123/7/base
2025-12-04T08:57:06.1696725Z  * [new branch]              gh/ruisizhang123/7/head     -> origin/gh/ruisizhang123/7/head
2025-12-04T08:57:06.1698253Z  * [new branch]              gh/ruisizhang123/7/orig     -> origin/gh/ruisizhang123/7/orig
2025-12-04T08:57:06.1700228Z  * [new branch]              gh/ruisizhang123/8/base     -> origin/gh/ruisizhang123/8/base
2025-12-04T08:57:06.1701765Z  * [new branch]              gh/ruisizhang123/8/head     -> origin/gh/ruisizhang123/8/head
2025-12-04T08:57:06.1703328Z  * [new branch]              gh/ruisizhang123/8/orig     -> origin/gh/ruisizhang123/8/orig
2025-12-04T08:57:06.1705442Z  * [new branch]              gh/ruisizhang123/9/base     -> origin/gh/ruisizhang123/9/base
2025-12-04T08:57:06.1707010Z  * [new branch]              gh/ruisizhang123/9/head     -> origin/gh/ruisizhang123/9/head
2025-12-04T08:57:06.1708605Z  * [new branch]              gh/ruisizhang123/9/orig     -> origin/gh/ruisizhang123/9/orig
2025-12-04T08:57:06.1711274Z  * [new branch]              gh/seemethere/52/base       -> origin/gh/seemethere/52/base
2025-12-04T08:57:06.1712829Z  * [new branch]              gh/seemethere/52/head       -> origin/gh/seemethere/52/head
2025-12-04T08:57:06.1714434Z  * [new branch]              gh/seemethere/52/orig       -> origin/gh/seemethere/52/orig
2025-12-04T08:57:06.1717166Z  * [new branch]              gh/seemethere/53/base       -> origin/gh/seemethere/53/base
2025-12-04T08:57:06.1718852Z  * [new branch]              gh/seemethere/53/head       -> origin/gh/seemethere/53/head
2025-12-04T08:57:06.1720504Z  * [new branch]              gh/seemethere/53/orig       -> origin/gh/seemethere/53/orig
2025-12-04T08:57:06.1722657Z  * [new branch]              gh/seemethere/54/base       -> origin/gh/seemethere/54/base
2025-12-04T08:57:06.1724261Z  * [new branch]              gh/seemethere/54/head       -> origin/gh/seemethere/54/head
2025-12-04T08:57:06.1725832Z  * [new branch]              gh/seemethere/54/orig       -> origin/gh/seemethere/54/orig
2025-12-04T08:57:06.1728018Z  * [new branch]              gh/seemethere/55/base       -> origin/gh/seemethere/55/base
2025-12-04T08:57:06.1729528Z  * [new branch]              gh/seemethere/55/head       -> origin/gh/seemethere/55/head
2025-12-04T08:57:06.1731077Z  * [new branch]              gh/seemethere/55/orig       -> origin/gh/seemethere/55/orig
2025-12-04T08:57:06.1733137Z  * [new branch]              gh/seemethere/59/base       -> origin/gh/seemethere/59/base
2025-12-04T08:57:06.1734701Z  * [new branch]              gh/seemethere/59/head       -> origin/gh/seemethere/59/head
2025-12-04T08:57:06.1736318Z  * [new branch]              gh/seemethere/59/orig       -> origin/gh/seemethere/59/orig
2025-12-04T08:57:06.1738923Z  * [new branch]              gh/seemethere/62/base       -> origin/gh/seemethere/62/base
2025-12-04T08:57:06.1740487Z  * [new branch]              gh/seemethere/62/head       -> origin/gh/seemethere/62/head
2025-12-04T08:57:06.1742069Z  * [new branch]              gh/seemethere/62/orig       -> origin/gh/seemethere/62/orig
2025-12-04T08:57:06.1744145Z  * [new branch]              gh/seemethere/63/base       -> origin/gh/seemethere/63/base
2025-12-04T08:57:06.1745772Z  * [new branch]              gh/seemethere/63/head       -> origin/gh/seemethere/63/head
2025-12-04T08:57:06.1747340Z  * [new branch]              gh/seemethere/63/orig       -> origin/gh/seemethere/63/orig
2025-12-04T08:57:06.1749409Z  * [new branch]              gh/seemethere/71/base       -> origin/gh/seemethere/71/base
2025-12-04T08:57:06.1751108Z  * [new branch]              gh/seemethere/71/head       -> origin/gh/seemethere/71/head
2025-12-04T08:57:06.1752667Z  * [new branch]              gh/seemethere/71/orig       -> origin/gh/seemethere/71/orig
2025-12-04T08:57:06.1754795Z  * [new branch]              gh/seemethere/72/base       -> origin/gh/seemethere/72/base
2025-12-04T08:57:06.1756385Z  * [new branch]              gh/seemethere/72/head       -> origin/gh/seemethere/72/head
2025-12-04T08:57:06.1757982Z  * [new branch]              gh/seemethere/72/orig       -> origin/gh/seemethere/72/orig
2025-12-04T08:57:06.1760077Z  * [new branch]              gh/seemethere/73/base       -> origin/gh/seemethere/73/base
2025-12-04T08:57:06.1761817Z  * [new branch]              gh/seemethere/73/head       -> origin/gh/seemethere/73/head
2025-12-04T08:57:06.1763467Z  * [new branch]              gh/seemethere/73/orig       -> origin/gh/seemethere/73/orig
2025-12-04T08:57:06.1765572Z  * [new branch]              gh/seemethere/74/base       -> origin/gh/seemethere/74/base
2025-12-04T08:57:06.1767152Z  * [new branch]              gh/seemethere/74/head       -> origin/gh/seemethere/74/head
2025-12-04T08:57:06.1769134Z  * [new branch]              gh/seemethere/74/orig       -> origin/gh/seemethere/74/orig
2025-12-04T08:57:06.1771296Z  * [new branch]              gh/seemethere/75/base       -> origin/gh/seemethere/75/base
2025-12-04T08:57:06.1772895Z  * [new branch]              gh/seemethere/75/head       -> origin/gh/seemethere/75/head
2025-12-04T08:57:06.1774493Z  * [new branch]              gh/seemethere/75/orig       -> origin/gh/seemethere/75/orig
2025-12-04T08:57:06.1776625Z  * [new branch]              gh/seemethere/76/base       -> origin/gh/seemethere/76/base
2025-12-04T08:57:06.1778200Z  * [new branch]              gh/seemethere/76/head       -> origin/gh/seemethere/76/head
2025-12-04T08:57:06.1779834Z  * [new branch]              gh/seemethere/76/orig       -> origin/gh/seemethere/76/orig
2025-12-04T08:57:06.1782595Z  * [new branch]              gh/shunting314/145/base     -> origin/gh/shunting314/145/base
2025-12-04T08:57:06.1784308Z  * [new branch]              gh/shunting314/145/head     -> origin/gh/shunting314/145/head
2025-12-04T08:57:06.1785991Z  * [new branch]              gh/shunting314/145/orig     -> origin/gh/shunting314/145/orig
2025-12-04T08:57:06.1788306Z  * [new branch]              gh/shunting314/176/base     -> origin/gh/shunting314/176/base
2025-12-04T08:57:06.1790180Z  * [new branch]              gh/shunting314/176/head     -> origin/gh/shunting314/176/head
2025-12-04T08:57:06.1791694Z  * [new branch]              gh/shunting314/176/orig     -> origin/gh/shunting314/176/orig
2025-12-04T08:57:06.1793827Z  * [new branch]              gh/shunting314/249/base     -> origin/gh/shunting314/249/base
2025-12-04T08:57:06.1795499Z  * [new branch]              gh/shunting314/249/head     -> origin/gh/shunting314/249/head
2025-12-04T08:57:06.1797549Z  * [new branch]              gh/shunting314/249/orig     -> origin/gh/shunting314/249/orig
2025-12-04T08:57:06.1799843Z  * [new branch]              gh/shunting314/253/base     -> origin/gh/shunting314/253/base
2025-12-04T08:57:06.1801492Z  * [new branch]              gh/shunting314/253/head     -> origin/gh/shunting314/253/head
2025-12-04T08:57:06.1803107Z  * [new branch]              gh/shunting314/253/orig     -> origin/gh/shunting314/253/orig
2025-12-04T08:57:06.1805284Z  * [new branch]              gh/shunting314/256/base     -> origin/gh/shunting314/256/base
2025-12-04T08:57:06.1806857Z  * [new branch]              gh/shunting314/256/head     -> origin/gh/shunting314/256/head
2025-12-04T08:57:06.1808426Z  * [new branch]              gh/shunting314/256/orig     -> origin/gh/shunting314/256/orig
2025-12-04T08:57:06.1810762Z  * [new branch]              gh/shunting314/257/base     -> origin/gh/shunting314/257/base
2025-12-04T08:57:06.1812328Z  * [new branch]              gh/shunting314/257/head     -> origin/gh/shunting314/257/head
2025-12-04T08:57:06.1813931Z  * [new branch]              gh/shunting314/257/orig     -> origin/gh/shunting314/257/orig
2025-12-04T08:57:06.1816178Z  * [new branch]              gh/shunting314/258/base     -> origin/gh/shunting314/258/base
2025-12-04T08:57:06.1819124Z  * [new branch]              gh/shunting314/258/head     -> origin/gh/shunting314/258/head
2025-12-04T08:57:06.1820698Z  * [new branch]              gh/shunting314/258/orig     -> origin/gh/shunting314/258/orig
2025-12-04T08:57:06.1822691Z  * [new branch]              gh/shunting314/259/base     -> origin/gh/shunting314/259/base
2025-12-04T08:57:06.1824354Z  * [new branch]              gh/shunting314/259/head     -> origin/gh/shunting314/259/head
2025-12-04T08:57:06.1826024Z  * [new branch]              gh/shunting314/259/orig     -> origin/gh/shunting314/259/orig
2025-12-04T08:57:06.1828178Z  * [new branch]              gh/shunting314/260/base     -> origin/gh/shunting314/260/base
2025-12-04T08:57:06.1829811Z  * [new branch]              gh/shunting314/260/head     -> origin/gh/shunting314/260/head
2025-12-04T08:57:06.1831473Z  * [new branch]              gh/shunting314/260/orig     -> origin/gh/shunting314/260/orig
2025-12-04T08:57:06.1833681Z  * [new branch]              gh/shunting314/261/base     -> origin/gh/shunting314/261/base
2025-12-04T08:57:06.1835382Z  * [new branch]              gh/shunting314/261/head     -> origin/gh/shunting314/261/head
2025-12-04T08:57:06.1837175Z  * [new branch]              gh/shunting314/261/orig     -> origin/gh/shunting314/261/orig
2025-12-04T08:57:06.1839311Z  * [new branch]              gh/shunting314/262/base     -> origin/gh/shunting314/262/base
2025-12-04T08:57:06.1841228Z  * [new branch]              gh/shunting314/262/head     -> origin/gh/shunting314/262/head
2025-12-04T08:57:06.1842808Z  * [new branch]              gh/shunting314/262/orig     -> origin/gh/shunting314/262/orig
2025-12-04T08:57:06.1845052Z  * [new branch]              gh/shunting314/263/base     -> origin/gh/shunting314/263/base
2025-12-04T08:57:06.1846755Z  * [new branch]              gh/shunting314/263/head     -> origin/gh/shunting314/263/head
2025-12-04T08:57:06.1848418Z  * [new branch]              gh/shunting314/263/orig     -> origin/gh/shunting314/263/orig
2025-12-04T08:57:06.1850497Z  * [new branch]              gh/shunting314/264/base     -> origin/gh/shunting314/264/base
2025-12-04T08:57:06.1852489Z  * [new branch]              gh/shunting314/264/head     -> origin/gh/shunting314/264/head
2025-12-04T08:57:06.1854313Z  * [new branch]              gh/shunting314/264/orig     -> origin/gh/shunting314/264/orig
2025-12-04T08:57:06.1856347Z  * [new branch]              gh/shunting314/265/base     -> origin/gh/shunting314/265/base
2025-12-04T08:57:06.1857847Z  * [new branch]              gh/shunting314/265/head     -> origin/gh/shunting314/265/head
2025-12-04T08:57:06.1859366Z  * [new branch]              gh/shunting314/265/orig     -> origin/gh/shunting314/265/orig
2025-12-04T08:57:06.1861478Z  * [new branch]              gh/shunting314/266/base     -> origin/gh/shunting314/266/base
2025-12-04T08:57:06.1863183Z  * [new branch]              gh/shunting314/266/head     -> origin/gh/shunting314/266/head
2025-12-04T08:57:06.1864753Z  * [new branch]              gh/shunting314/266/orig     -> origin/gh/shunting314/266/orig
2025-12-04T08:57:06.1867058Z  * [new branch]              gh/shunting314/267/base     -> origin/gh/shunting314/267/base
2025-12-04T08:57:06.1868792Z  * [new branch]              gh/shunting314/267/head     -> origin/gh/shunting314/267/head
2025-12-04T08:57:06.1870428Z  * [new branch]              gh/shunting314/267/orig     -> origin/gh/shunting314/267/orig
2025-12-04T08:57:06.1873006Z  * [new branch]              gh/shunting314/268/base     -> origin/gh/shunting314/268/base
2025-12-04T08:57:06.1875095Z  * [new branch]              gh/shunting314/268/head     -> origin/gh/shunting314/268/head
2025-12-04T08:57:06.1876726Z  * [new branch]              gh/shunting314/268/orig     -> origin/gh/shunting314/268/orig
2025-12-04T08:57:06.1878993Z  * [new branch]              gh/shunting314/269/base     -> origin/gh/shunting314/269/base
2025-12-04T08:57:06.1880743Z  * [new branch]              gh/shunting314/269/head     -> origin/gh/shunting314/269/head
2025-12-04T08:57:06.1882304Z  * [new branch]              gh/shunting314/269/orig     -> origin/gh/shunting314/269/orig
2025-12-04T08:57:06.1884944Z  * [new branch]              gh/silverguo/1/base         -> origin/gh/silverguo/1/base
2025-12-04T08:57:06.1886600Z  * [new branch]              gh/silverguo/1/head         -> origin/gh/silverguo/1/head
2025-12-04T08:57:06.1888560Z  * [new branch]              gh/silverguo/2/base         -> origin/gh/silverguo/2/base
2025-12-04T08:57:06.1890083Z  * [new branch]              gh/silverguo/2/head         -> origin/gh/silverguo/2/head
2025-12-04T08:57:06.1892143Z  * [new branch]              gh/silverguo/3/base         -> origin/gh/silverguo/3/base
2025-12-04T08:57:06.1893707Z  * [new branch]              gh/silverguo/3/head         -> origin/gh/silverguo/3/head
2025-12-04T08:57:06.1895696Z  * [new branch]              gh/silverguo/4/base         -> origin/gh/silverguo/4/base
2025-12-04T08:57:06.1897239Z  * [new branch]              gh/silverguo/4/head         -> origin/gh/silverguo/4/head
2025-12-04T08:57:06.1899816Z  * [new branch]              gh/slayton58/39/base        -> origin/gh/slayton58/39/base
2025-12-04T08:57:06.1902073Z  * [new branch]              gh/slayton58/39/head        -> origin/gh/slayton58/39/head
2025-12-04T08:57:06.1903715Z  * [new branch]              gh/slayton58/39/orig        -> origin/gh/slayton58/39/orig
2025-12-04T08:57:06.1905888Z  * [new branch]              gh/slayton58/42/base        -> origin/gh/slayton58/42/base
2025-12-04T08:57:06.1907456Z  * [new branch]              gh/slayton58/42/head        -> origin/gh/slayton58/42/head
2025-12-04T08:57:06.1909182Z  * [new branch]              gh/slayton58/42/orig        -> origin/gh/slayton58/42/orig
2025-12-04T08:57:06.1911362Z  * [new branch]              gh/slayton58/43/base        -> origin/gh/slayton58/43/base
2025-12-04T08:57:06.1912963Z  * [new branch]              gh/slayton58/43/head        -> origin/gh/slayton58/43/head
2025-12-04T08:57:06.1914574Z  * [new branch]              gh/slayton58/43/orig        -> origin/gh/slayton58/43/orig
2025-12-04T08:57:06.1916822Z  * [new branch]              gh/slayton58/44/base        -> origin/gh/slayton58/44/base
2025-12-04T08:57:06.1918620Z  * [new branch]              gh/slayton58/44/head        -> origin/gh/slayton58/44/head
2025-12-04T08:57:06.1920331Z  * [new branch]              gh/slayton58/44/orig        -> origin/gh/slayton58/44/orig
2025-12-04T08:57:06.1922414Z  * [new branch]              gh/slayton58/45/base        -> origin/gh/slayton58/45/base
2025-12-04T08:57:06.1923980Z  * [new branch]              gh/slayton58/45/head        -> origin/gh/slayton58/45/head
2025-12-04T08:57:06.1925561Z  * [new branch]              gh/slayton58/45/orig        -> origin/gh/slayton58/45/orig
2025-12-04T08:57:06.1927679Z  * [new branch]              gh/slayton58/46/base        -> origin/gh/slayton58/46/base
2025-12-04T08:57:06.1929402Z  * [new branch]              gh/slayton58/46/head        -> origin/gh/slayton58/46/head
2025-12-04T08:57:06.1931102Z  * [new branch]              gh/slayton58/46/orig        -> origin/gh/slayton58/46/orig
2025-12-04T08:57:06.1933254Z  * [new branch]              gh/slayton58/6/base         -> origin/gh/slayton58/6/base
2025-12-04T08:57:06.1934890Z  * [new branch]              gh/slayton58/6/head         -> origin/gh/slayton58/6/head
2025-12-04T08:57:06.1936874Z  * [new branch]              gh/slayton58/7/base         -> origin/gh/slayton58/7/base
2025-12-04T08:57:06.1938370Z  * [new branch]              gh/slayton58/7/head         -> origin/gh/slayton58/7/head
2025-12-04T08:57:06.1941469Z  * [new branch]              gh/soulitzer/269/base       -> origin/gh/soulitzer/269/base
2025-12-04T08:57:06.1943038Z  * [new branch]              gh/soulitzer/269/head       -> origin/gh/soulitzer/269/head
2025-12-04T08:57:06.1944687Z  * [new branch]              gh/soulitzer/269/orig       -> origin/gh/soulitzer/269/orig
2025-12-04T08:57:06.1946867Z  * [new branch]              gh/soulitzer/276/base       -> origin/gh/soulitzer/276/base
2025-12-04T08:57:06.1948450Z  * [new branch]              gh/soulitzer/276/head       -> origin/gh/soulitzer/276/head
2025-12-04T08:57:06.1950060Z  * [new branch]              gh/soulitzer/276/orig       -> origin/gh/soulitzer/276/orig
2025-12-04T08:57:06.1952442Z  * [new branch]              gh/soulitzer/287/base       -> origin/gh/soulitzer/287/base
2025-12-04T08:57:06.1954082Z  * [new branch]              gh/soulitzer/287/head       -> origin/gh/soulitzer/287/head
2025-12-04T08:57:06.1955753Z  * [new branch]              gh/soulitzer/287/orig       -> origin/gh/soulitzer/287/orig
2025-12-04T08:57:06.1958012Z  * [new branch]              gh/soulitzer/296/base       -> origin/gh/soulitzer/296/base
2025-12-04T08:57:06.1959648Z  * [new branch]              gh/soulitzer/296/head       -> origin/gh/soulitzer/296/head
2025-12-04T08:57:06.1961394Z  * [new branch]              gh/soulitzer/296/orig       -> origin/gh/soulitzer/296/orig
2025-12-04T08:57:06.1963639Z  * [new branch]              gh/soulitzer/299/base       -> origin/gh/soulitzer/299/base
2025-12-04T08:57:06.1965325Z  * [new branch]              gh/soulitzer/299/head       -> origin/gh/soulitzer/299/head
2025-12-04T08:57:06.1966892Z  * [new branch]              gh/soulitzer/299/orig       -> origin/gh/soulitzer/299/orig
2025-12-04T08:57:06.1968998Z  * [new branch]              gh/soulitzer/300/base       -> origin/gh/soulitzer/300/base
2025-12-04T08:57:06.1970708Z  * [new branch]              gh/soulitzer/300/head       -> origin/gh/soulitzer/300/head
2025-12-04T08:57:06.1972313Z  * [new branch]              gh/soulitzer/300/orig       -> origin/gh/soulitzer/300/orig
2025-12-04T08:57:06.1974508Z  * [new branch]              gh/soulitzer/301/base       -> origin/gh/soulitzer/301/base
2025-12-04T08:57:06.1976187Z  * [new branch]              gh/soulitzer/301/head       -> origin/gh/soulitzer/301/head
2025-12-04T08:57:06.1977778Z  * [new branch]              gh/soulitzer/301/orig       -> origin/gh/soulitzer/301/orig
2025-12-04T08:57:06.1979873Z  * [new branch]              gh/soulitzer/313/base       -> origin/gh/soulitzer/313/base
2025-12-04T08:57:06.1981413Z  * [new branch]              gh/soulitzer/313/head       -> origin/gh/soulitzer/313/head
2025-12-04T08:57:06.1983031Z  * [new branch]              gh/soulitzer/313/orig       -> origin/gh/soulitzer/313/orig
2025-12-04T08:57:06.1985322Z  * [new branch]              gh/soulitzer/319/base       -> origin/gh/soulitzer/319/base
2025-12-04T08:57:06.1986817Z  * [new branch]              gh/soulitzer/319/head       -> origin/gh/soulitzer/319/head
2025-12-04T08:57:06.1988360Z  * [new branch]              gh/soulitzer/319/orig       -> origin/gh/soulitzer/319/orig
2025-12-04T08:57:06.1990625Z  * [new branch]              gh/soulitzer/320/base       -> origin/gh/soulitzer/320/base
2025-12-04T08:57:06.1992145Z  * [new branch]              gh/soulitzer/320/head       -> origin/gh/soulitzer/320/head
2025-12-04T08:57:06.1993709Z  * [new branch]              gh/soulitzer/320/orig       -> origin/gh/soulitzer/320/orig
2025-12-04T08:57:06.1995895Z  * [new branch]              gh/soulitzer/336/base       -> origin/gh/soulitzer/336/base
2025-12-04T08:57:06.1997514Z  * [new branch]              gh/soulitzer/336/head       -> origin/gh/soulitzer/336/head
2025-12-04T08:57:06.1999061Z  * [new branch]              gh/soulitzer/336/orig       -> origin/gh/soulitzer/336/orig
2025-12-04T08:57:06.2001952Z  * [new branch]              gh/soulitzer/347/base       -> origin/gh/soulitzer/347/base
2025-12-04T08:57:06.2003457Z  * [new branch]              gh/soulitzer/347/head       -> origin/gh/soulitzer/347/head
2025-12-04T08:57:06.2005015Z  * [new branch]              gh/soulitzer/347/orig       -> origin/gh/soulitzer/347/orig
2025-12-04T08:57:06.2007294Z  * [new branch]              gh/soulitzer/349/base       -> origin/gh/soulitzer/349/base
2025-12-04T08:57:06.2008876Z  * [new branch]              gh/soulitzer/349/head       -> origin/gh/soulitzer/349/head
2025-12-04T08:57:06.2010398Z  * [new branch]              gh/soulitzer/349/orig       -> origin/gh/soulitzer/349/orig
2025-12-04T08:57:06.2012422Z  * [new branch]              gh/soulitzer/350/base       -> origin/gh/soulitzer/350/base
2025-12-04T08:57:06.2013942Z  * [new branch]              gh/soulitzer/350/head       -> origin/gh/soulitzer/350/head
2025-12-04T08:57:06.2015532Z  * [new branch]              gh/soulitzer/350/orig       -> origin/gh/soulitzer/350/orig
2025-12-04T08:57:06.2017929Z  * [new branch]              gh/soulitzer/351/base       -> origin/gh/soulitzer/351/base
2025-12-04T08:57:06.2019868Z  * [new branch]              gh/soulitzer/351/head       -> origin/gh/soulitzer/351/head
2025-12-04T08:57:06.2021823Z  * [new branch]              gh/soulitzer/351/orig       -> origin/gh/soulitzer/351/orig
2025-12-04T08:57:06.2023860Z  * [new branch]              gh/soulitzer/353/base       -> origin/gh/soulitzer/353/base
2025-12-04T08:57:06.2025573Z  * [new branch]              gh/soulitzer/353/head       -> origin/gh/soulitzer/353/head
2025-12-04T08:57:06.2027176Z  * [new branch]              gh/soulitzer/353/orig       -> origin/gh/soulitzer/353/orig
2025-12-04T08:57:06.2029925Z  * [new branch]              gh/soulitzer/358/base       -> origin/gh/soulitzer/358/base
2025-12-04T08:57:06.2031502Z  * [new branch]              gh/soulitzer/358/head       -> origin/gh/soulitzer/358/head
2025-12-04T08:57:06.2033136Z  * [new branch]              gh/soulitzer/358/orig       -> origin/gh/soulitzer/358/orig
2025-12-04T08:57:06.2035664Z  * [new branch]              gh/soulitzer/359/base       -> origin/gh/soulitzer/359/base
2025-12-04T08:57:06.2037347Z  * [new branch]              gh/soulitzer/359/head       -> origin/gh/soulitzer/359/head
2025-12-04T08:57:06.2039044Z  * [new branch]              gh/soulitzer/359/orig       -> origin/gh/soulitzer/359/orig
2025-12-04T08:57:06.2041429Z  * [new branch]              gh/soulitzer/374/base       -> origin/gh/soulitzer/374/base
2025-12-04T08:57:06.2043059Z  * [new branch]              gh/soulitzer/374/head       -> origin/gh/soulitzer/374/head
2025-12-04T08:57:06.2044643Z  * [new branch]              gh/soulitzer/374/orig       -> origin/gh/soulitzer/374/orig
2025-12-04T08:57:06.2046779Z  * [new branch]              gh/soulitzer/375/base       -> origin/gh/soulitzer/375/base
2025-12-04T08:57:06.2048328Z  * [new branch]              gh/soulitzer/375/head       -> origin/gh/soulitzer/375/head
2025-12-04T08:57:06.2049942Z  * [new branch]              gh/soulitzer/375/orig       -> origin/gh/soulitzer/375/orig
2025-12-04T08:57:06.2052017Z  * [new branch]              gh/soulitzer/380/base       -> origin/gh/soulitzer/380/base
2025-12-04T08:57:06.2053548Z  * [new branch]              gh/soulitzer/380/head       -> origin/gh/soulitzer/380/head
2025-12-04T08:57:06.2055122Z  * [new branch]              gh/soulitzer/380/orig       -> origin/gh/soulitzer/380/orig
2025-12-04T08:57:06.2057297Z  * [new branch]              gh/soulitzer/385/base       -> origin/gh/soulitzer/385/base
2025-12-04T08:57:06.2058899Z  * [new branch]              gh/soulitzer/385/head       -> origin/gh/soulitzer/385/head
2025-12-04T08:57:06.2060514Z  * [new branch]              gh/soulitzer/385/orig       -> origin/gh/soulitzer/385/orig
2025-12-04T08:57:06.2062731Z  * [new branch]              gh/soulitzer/386/base       -> origin/gh/soulitzer/386/base
2025-12-04T08:57:06.2064389Z  * [new branch]              gh/soulitzer/386/head       -> origin/gh/soulitzer/386/head
2025-12-04T08:57:06.2065953Z  * [new branch]              gh/soulitzer/386/orig       -> origin/gh/soulitzer/386/orig
2025-12-04T08:57:06.2068174Z  * [new branch]              gh/soulitzer/387/base       -> origin/gh/soulitzer/387/base
2025-12-04T08:57:06.2069726Z  * [new branch]              gh/soulitzer/387/head       -> origin/gh/soulitzer/387/head
2025-12-04T08:57:06.2071316Z  * [new branch]              gh/soulitzer/387/orig       -> origin/gh/soulitzer/387/orig
2025-12-04T08:57:06.2073483Z  * [new branch]              gh/soulitzer/388/base       -> origin/gh/soulitzer/388/base
2025-12-04T08:57:06.2075107Z  * [new branch]              gh/soulitzer/388/head       -> origin/gh/soulitzer/388/head
2025-12-04T08:57:06.2076655Z  * [new branch]              gh/soulitzer/388/orig       -> origin/gh/soulitzer/388/orig
2025-12-04T08:57:06.2078795Z  * [new branch]              gh/soulitzer/389/base       -> origin/gh/soulitzer/389/base
2025-12-04T08:57:06.2080448Z  * [new branch]              gh/soulitzer/389/head       -> origin/gh/soulitzer/389/head
2025-12-04T08:57:06.2082002Z  * [new branch]              gh/soulitzer/389/orig       -> origin/gh/soulitzer/389/orig
2025-12-04T08:57:06.2084175Z  * [new branch]              gh/soulitzer/390/base       -> origin/gh/soulitzer/390/base
2025-12-04T08:57:06.2085816Z  * [new branch]              gh/soulitzer/390/head       -> origin/gh/soulitzer/390/head
2025-12-04T08:57:06.2087378Z  * [new branch]              gh/soulitzer/390/orig       -> origin/gh/soulitzer/390/orig
2025-12-04T08:57:06.2089677Z  * [new branch]              gh/soulitzer/391/base       -> origin/gh/soulitzer/391/base
2025-12-04T08:57:06.2091301Z  * [new branch]              gh/soulitzer/391/head       -> origin/gh/soulitzer/391/head
2025-12-04T08:57:06.2092887Z  * [new branch]              gh/soulitzer/391/orig       -> origin/gh/soulitzer/391/orig
2025-12-04T08:57:06.2095033Z  * [new branch]              gh/soulitzer/392/base       -> origin/gh/soulitzer/392/base
2025-12-04T08:57:06.2096613Z  * [new branch]              gh/soulitzer/392/head       -> origin/gh/soulitzer/392/head
2025-12-04T08:57:06.2098168Z  * [new branch]              gh/soulitzer/392/orig       -> origin/gh/soulitzer/392/orig
2025-12-04T08:57:06.2100652Z  * [new branch]              gh/swolchok/728/next        -> origin/gh/swolchok/728/next
2025-12-04T08:57:06.2102938Z  * [new branch]              gh/swolchok/819/base        -> origin/gh/swolchok/819/base
2025-12-04T08:57:06.2104526Z  * [new branch]              gh/swolchok/819/head        -> origin/gh/swolchok/819/head
2025-12-04T08:57:06.2106176Z  * [new branch]              gh/swolchok/819/orig        -> origin/gh/swolchok/819/orig
2025-12-04T08:57:06.2108338Z  * [new branch]              gh/swolchok/824/base        -> origin/gh/swolchok/824/base
2025-12-04T08:57:06.2109869Z  * [new branch]              gh/swolchok/824/head        -> origin/gh/swolchok/824/head
2025-12-04T08:57:06.2111442Z  * [new branch]              gh/swolchok/824/orig        -> origin/gh/swolchok/824/orig
2025-12-04T08:57:06.2113665Z  * [new branch]              gh/swolchok/829/base        -> origin/gh/swolchok/829/base
2025-12-04T08:57:06.2115176Z  * [new branch]              gh/swolchok/829/head        -> origin/gh/swolchok/829/head
2025-12-04T08:57:06.2116728Z  * [new branch]              gh/swolchok/829/orig        -> origin/gh/swolchok/829/orig
2025-12-04T08:57:06.2119230Z  * [new branch]              gh/swolchok/839/base        -> origin/gh/swolchok/839/base
2025-12-04T08:57:06.2120825Z  * [new branch]              gh/swolchok/839/head        -> origin/gh/swolchok/839/head
2025-12-04T08:57:06.2122435Z  * [new branch]              gh/swolchok/839/orig        -> origin/gh/swolchok/839/orig
2025-12-04T08:57:06.2124463Z  * [new branch]              gh/swolchok/841/base        -> origin/gh/swolchok/841/base
2025-12-04T08:57:06.2126028Z  * [new branch]              gh/swolchok/841/head        -> origin/gh/swolchok/841/head
2025-12-04T08:57:06.2127722Z  * [new branch]              gh/swolchok/841/orig        -> origin/gh/swolchok/841/orig
2025-12-04T08:57:06.2129861Z  * [new branch]              gh/swolchok/842/base        -> origin/gh/swolchok/842/base
2025-12-04T08:57:06.2131438Z  * [new branch]              gh/swolchok/842/head        -> origin/gh/swolchok/842/head
2025-12-04T08:57:06.2133027Z  * [new branch]              gh/swolchok/842/orig        -> origin/gh/swolchok/842/orig
2025-12-04T08:57:06.2135166Z  * [new branch]              gh/swolchok/845/base        -> origin/gh/swolchok/845/base
2025-12-04T08:57:06.2136778Z  * [new branch]              gh/swolchok/845/head        -> origin/gh/swolchok/845/head
2025-12-04T08:57:06.2138397Z  * [new branch]              gh/swolchok/845/orig        -> origin/gh/swolchok/845/orig
2025-12-04T08:57:06.2140557Z  * [new branch]              gh/swolchok/848/base        -> origin/gh/swolchok/848/base
2025-12-04T08:57:06.2142293Z  * [new branch]              gh/swolchok/848/head        -> origin/gh/swolchok/848/head
2025-12-04T08:57:06.2143890Z  * [new branch]              gh/swolchok/848/orig        -> origin/gh/swolchok/848/orig
2025-12-04T08:57:06.2145991Z  * [new branch]              gh/swolchok/856/base        -> origin/gh/swolchok/856/base
2025-12-04T08:57:06.2147576Z  * [new branch]              gh/swolchok/856/head        -> origin/gh/swolchok/856/head
2025-12-04T08:57:06.2149316Z  * [new branch]              gh/swolchok/856/orig        -> origin/gh/swolchok/856/orig
2025-12-04T08:57:06.2151528Z  * [new branch]              gh/swolchok/860/base        -> origin/gh/swolchok/860/base
2025-12-04T08:57:06.2153159Z  * [new branch]              gh/swolchok/860/head        -> origin/gh/swolchok/860/head
2025-12-04T08:57:06.2154645Z  * [new branch]              gh/swolchok/860/orig        -> origin/gh/swolchok/860/orig
2025-12-04T08:57:06.2156975Z  * [new branch]              gh/swolchok/861/base        -> origin/gh/swolchok/861/base
2025-12-04T08:57:06.2158624Z  * [new branch]              gh/swolchok/861/head        -> origin/gh/swolchok/861/head
2025-12-04T08:57:06.2160188Z  * [new branch]              gh/swolchok/861/orig        -> origin/gh/swolchok/861/orig
2025-12-04T08:57:06.2162429Z  * [new branch]              gh/swolchok/862/base        -> origin/gh/swolchok/862/base
2025-12-04T08:57:06.2163892Z  * [new branch]              gh/swolchok/862/head        -> origin/gh/swolchok/862/head
2025-12-04T08:57:06.2165461Z  * [new branch]              gh/swolchok/862/orig        -> origin/gh/swolchok/862/orig
2025-12-04T08:57:06.2167701Z  * [new branch]              gh/swolchok/863/base        -> origin/gh/swolchok/863/base
2025-12-04T08:57:06.2169732Z  * [new branch]              gh/swolchok/863/head        -> origin/gh/swolchok/863/head
2025-12-04T08:57:06.2171469Z  * [new branch]              gh/swolchok/863/orig        -> origin/gh/swolchok/863/orig
2025-12-04T08:57:06.2173685Z  * [new branch]              gh/swolchok/864/base        -> origin/gh/swolchok/864/base
2025-12-04T08:57:06.2175295Z  * [new branch]              gh/swolchok/864/head        -> origin/gh/swolchok/864/head
2025-12-04T08:57:06.2177011Z  * [new branch]              gh/swolchok/864/orig        -> origin/gh/swolchok/864/orig
2025-12-04T08:57:06.2179058Z  * [new branch]              gh/swolchok/865/base        -> origin/gh/swolchok/865/base
2025-12-04T08:57:06.2180754Z  * [new branch]              gh/swolchok/865/head        -> origin/gh/swolchok/865/head
2025-12-04T08:57:06.2182329Z  * [new branch]              gh/swolchok/865/orig        -> origin/gh/swolchok/865/orig
2025-12-04T08:57:06.2185030Z  * [new branch]              gh/swolchok/866/base        -> origin/gh/swolchok/866/base
2025-12-04T08:57:06.2186618Z  * [new branch]              gh/swolchok/866/head        -> origin/gh/swolchok/866/head
2025-12-04T08:57:06.2188284Z  * [new branch]              gh/swolchok/866/orig        -> origin/gh/swolchok/866/orig
2025-12-04T08:57:06.2190400Z  * [new branch]              gh/swolchok/867/base        -> origin/gh/swolchok/867/base
2025-12-04T08:57:06.2192103Z  * [new branch]              gh/swolchok/867/head        -> origin/gh/swolchok/867/head
2025-12-04T08:57:06.2193754Z  * [new branch]              gh/swolchok/867/orig        -> origin/gh/swolchok/867/orig
2025-12-04T08:57:06.2195916Z  * [new branch]              gh/swolchok/868/base        -> origin/gh/swolchok/868/base
2025-12-04T08:57:06.2197539Z  * [new branch]              gh/swolchok/868/head        -> origin/gh/swolchok/868/head
2025-12-04T08:57:06.2199236Z  * [new branch]              gh/swolchok/868/orig        -> origin/gh/swolchok/868/orig
2025-12-04T08:57:06.2201538Z  * [new branch]              gh/swolchok/869/base        -> origin/gh/swolchok/869/base
2025-12-04T08:57:06.2203114Z  * [new branch]              gh/swolchok/869/head        -> origin/gh/swolchok/869/head
2025-12-04T08:57:06.2204711Z  * [new branch]              gh/swolchok/869/orig        -> origin/gh/swolchok/869/orig
2025-12-04T08:57:06.2206959Z  * [new branch]              gh/swolchok/870/base        -> origin/gh/swolchok/870/base
2025-12-04T08:57:06.2208514Z  * [new branch]              gh/swolchok/870/head        -> origin/gh/swolchok/870/head
2025-12-04T08:57:06.2209986Z  * [new branch]              gh/swolchok/870/orig        -> origin/gh/swolchok/870/orig
2025-12-04T08:57:06.2212221Z  * [new branch]              gh/swolchok/871/base        -> origin/gh/swolchok/871/base
2025-12-04T08:57:06.2213817Z  * [new branch]              gh/swolchok/871/head        -> origin/gh/swolchok/871/head
2025-12-04T08:57:06.2215545Z  * [new branch]              gh/swolchok/871/orig        -> origin/gh/swolchok/871/orig
2025-12-04T08:57:06.2220009Z  * [new branch]              gh/teja-rao/4/base          -> origin/gh/teja-rao/4/base
2025-12-04T08:57:06.2221629Z  * [new branch]              gh/teja-rao/4/head          -> origin/gh/teja-rao/4/head
2025-12-04T08:57:06.2223233Z  * [new branch]              gh/teja-rao/4/orig          -> origin/gh/teja-rao/4/orig
2025-12-04T08:57:06.2225793Z  * [new branch]              gh/tianyu-l/2/base          -> origin/gh/tianyu-l/2/base
2025-12-04T08:57:06.2227387Z  * [new branch]              gh/tianyu-l/2/head          -> origin/gh/tianyu-l/2/head
2025-12-04T08:57:06.2228988Z  * [new branch]              gh/tianyu-l/2/orig          -> origin/gh/tianyu-l/2/orig
2025-12-04T08:57:06.2231074Z  * [new branch]              gh/tianyu-l/3/base          -> origin/gh/tianyu-l/3/base
2025-12-04T08:57:06.2232768Z  * [new branch]              gh/tianyu-l/3/orig          -> origin/gh/tianyu-l/3/orig
2025-12-04T08:57:06.2234798Z  * [new branch]              gh/tianyu-l/4/base          -> origin/gh/tianyu-l/4/base
2025-12-04T08:57:06.2236394Z  * [new branch]              gh/tianyu-l/4/head          -> origin/gh/tianyu-l/4/head
2025-12-04T08:57:06.2237990Z  * [new branch]              gh/tianyu-l/4/orig          -> origin/gh/tianyu-l/4/orig
2025-12-04T08:57:06.2241053Z  * [new branch]              gh/tugsbayasgalan/10/base   -> origin/gh/tugsbayasgalan/10/base
2025-12-04T08:57:06.2242662Z  * [new branch]              gh/tugsbayasgalan/10/head   -> origin/gh/tugsbayasgalan/10/head
2025-12-04T08:57:06.2244440Z  * [new branch]              gh/tugsbayasgalan/10/orig   -> origin/gh/tugsbayasgalan/10/orig
2025-12-04T08:57:06.2246570Z  * [new branch]              gh/tugsbayasgalan/13/base   -> origin/gh/tugsbayasgalan/13/base
2025-12-04T08:57:06.2248124Z  * [new branch]              gh/tugsbayasgalan/13/head   -> origin/gh/tugsbayasgalan/13/head
2025-12-04T08:57:06.2249664Z  * [new branch]              gh/tugsbayasgalan/13/orig   -> origin/gh/tugsbayasgalan/13/orig
2025-12-04T08:57:06.2251879Z  * [new branch]              gh/tugsbayasgalan/17/base   -> origin/gh/tugsbayasgalan/17/base
2025-12-04T08:57:06.2253401Z  * [new branch]              gh/tugsbayasgalan/17/head   -> origin/gh/tugsbayasgalan/17/head
2025-12-04T08:57:06.2255042Z  * [new branch]              gh/tugsbayasgalan/17/orig   -> origin/gh/tugsbayasgalan/17/orig
2025-12-04T08:57:06.2257281Z  * [new branch]              gh/tugsbayasgalan/2/base    -> origin/gh/tugsbayasgalan/2/base
2025-12-04T08:57:06.2258907Z  * [new branch]              gh/tugsbayasgalan/2/head    -> origin/gh/tugsbayasgalan/2/head
2025-12-04T08:57:06.2260432Z  * [new branch]              gh/tugsbayasgalan/2/orig    -> origin/gh/tugsbayasgalan/2/orig
2025-12-04T08:57:06.2262757Z  * [new branch]              gh/tugsbayasgalan/28/base   -> origin/gh/tugsbayasgalan/28/base
2025-12-04T08:57:06.2264409Z  * [new branch]              gh/tugsbayasgalan/28/head   -> origin/gh/tugsbayasgalan/28/head
2025-12-04T08:57:06.2265985Z  * [new branch]              gh/tugsbayasgalan/28/orig   -> origin/gh/tugsbayasgalan/28/orig
2025-12-04T08:57:06.2268083Z  * [new branch]              gh/tugsbayasgalan/32/base   -> origin/gh/tugsbayasgalan/32/base
2025-12-04T08:57:06.2269686Z  * [new branch]              gh/tugsbayasgalan/32/head   -> origin/gh/tugsbayasgalan/32/head
2025-12-04T08:57:06.2271301Z  * [new branch]              gh/tugsbayasgalan/32/orig   -> origin/gh/tugsbayasgalan/32/orig
2025-12-04T08:57:06.2273501Z  * [new branch]              gh/tugsbayasgalan/35/base   -> origin/gh/tugsbayasgalan/35/base
2025-12-04T08:57:06.2275114Z  * [new branch]              gh/tugsbayasgalan/35/head   -> origin/gh/tugsbayasgalan/35/head
2025-12-04T08:57:06.2276744Z  * [new branch]              gh/tugsbayasgalan/35/orig   -> origin/gh/tugsbayasgalan/35/orig
2025-12-04T08:57:06.2278922Z  * [new branch]              gh/tugsbayasgalan/36/base   -> origin/gh/tugsbayasgalan/36/base
2025-12-04T08:57:06.2280617Z  * [new branch]              gh/tugsbayasgalan/36/head   -> origin/gh/tugsbayasgalan/36/head
2025-12-04T08:57:06.2282202Z  * [new branch]              gh/tugsbayasgalan/36/orig   -> origin/gh/tugsbayasgalan/36/orig
2025-12-04T08:57:06.2284322Z  * [new branch]              gh/tugsbayasgalan/37/base   -> origin/gh/tugsbayasgalan/37/base
2025-12-04T08:57:06.2285862Z  * [new branch]              gh/tugsbayasgalan/37/head   -> origin/gh/tugsbayasgalan/37/head
2025-12-04T08:57:06.2287436Z  * [new branch]              gh/tugsbayasgalan/37/orig   -> origin/gh/tugsbayasgalan/37/orig
2025-12-04T08:57:06.2289592Z  * [new branch]              gh/tugsbayasgalan/43/base   -> origin/gh/tugsbayasgalan/43/base
2025-12-04T08:57:06.2291095Z  * [new branch]              gh/tugsbayasgalan/43/head   -> origin/gh/tugsbayasgalan/43/head
2025-12-04T08:57:06.2292658Z  * [new branch]              gh/tugsbayasgalan/43/orig   -> origin/gh/tugsbayasgalan/43/orig
2025-12-04T08:57:06.2294676Z  * [new branch]              gh/tugsbayasgalan/48/base   -> origin/gh/tugsbayasgalan/48/base
2025-12-04T08:57:06.2296409Z  * [new branch]              gh/tugsbayasgalan/48/head   -> origin/gh/tugsbayasgalan/48/head
2025-12-04T08:57:06.2297954Z  * [new branch]              gh/tugsbayasgalan/48/orig   -> origin/gh/tugsbayasgalan/48/orig
2025-12-04T08:57:06.2300138Z  * [new branch]              gh/tugsbayasgalan/51/base   -> origin/gh/tugsbayasgalan/51/base
2025-12-04T08:57:06.2301742Z  * [new branch]              gh/tugsbayasgalan/51/head   -> origin/gh/tugsbayasgalan/51/head
2025-12-04T08:57:06.2303458Z  * [new branch]              gh/tugsbayasgalan/51/orig   -> origin/gh/tugsbayasgalan/51/orig
2025-12-04T08:57:06.2305368Z  * [new branch]              gh/tugsbayasgalan/52/base   -> origin/gh/tugsbayasgalan/52/base
2025-12-04T08:57:06.2306962Z  * [new branch]              gh/tugsbayasgalan/52/head   -> origin/gh/tugsbayasgalan/52/head
2025-12-04T08:57:06.2308579Z  * [new branch]              gh/tugsbayasgalan/52/orig   -> origin/gh/tugsbayasgalan/52/orig
2025-12-04T08:57:06.2310724Z  * [new branch]              gh/tugsbayasgalan/53/base   -> origin/gh/tugsbayasgalan/53/base
2025-12-04T08:57:06.2312283Z  * [new branch]              gh/tugsbayasgalan/53/head   -> origin/gh/tugsbayasgalan/53/head
2025-12-04T08:57:06.2313866Z  * [new branch]              gh/tugsbayasgalan/53/orig   -> origin/gh/tugsbayasgalan/53/orig
2025-12-04T08:57:06.2316098Z  * [new branch]              gh/tugsbayasgalan/55/base   -> origin/gh/tugsbayasgalan/55/base
2025-12-04T08:57:06.2317961Z  * [new branch]              gh/tugsbayasgalan/55/head   -> origin/gh/tugsbayasgalan/55/head
2025-12-04T08:57:06.2319618Z  * [new branch]              gh/tugsbayasgalan/55/orig   -> origin/gh/tugsbayasgalan/55/orig
2025-12-04T08:57:06.2322162Z  * [new branch]              gh/tugsbayasgalan/59/base   -> origin/gh/tugsbayasgalan/59/base
2025-12-04T08:57:06.2323751Z  * [new branch]              gh/tugsbayasgalan/59/head   -> origin/gh/tugsbayasgalan/59/head
2025-12-04T08:57:06.2325266Z  * [new branch]              gh/tugsbayasgalan/59/orig   -> origin/gh/tugsbayasgalan/59/orig
2025-12-04T08:57:06.2327347Z  * [new branch]              gh/tugsbayasgalan/6/base    -> origin/gh/tugsbayasgalan/6/base
2025-12-04T08:57:06.2328841Z  * [new branch]              gh/tugsbayasgalan/6/head    -> origin/gh/tugsbayasgalan/6/head
2025-12-04T08:57:06.2330577Z  * [new branch]              gh/tugsbayasgalan/6/orig    -> origin/gh/tugsbayasgalan/6/orig
2025-12-04T08:57:06.2332540Z  * [new branch]              gh/tugsbayasgalan/60/base   -> origin/gh/tugsbayasgalan/60/base
2025-12-04T08:57:06.2334186Z  * [new branch]              gh/tugsbayasgalan/60/head   -> origin/gh/tugsbayasgalan/60/head
2025-12-04T08:57:06.2335765Z  * [new branch]              gh/tugsbayasgalan/60/orig   -> origin/gh/tugsbayasgalan/60/orig
2025-12-04T08:57:06.2338409Z  * [new branch]              gh/tugsbayasgalan/61/base   -> origin/gh/tugsbayasgalan/61/base
2025-12-04T08:57:06.2339916Z  * [new branch]              gh/tugsbayasgalan/61/head   -> origin/gh/tugsbayasgalan/61/head
2025-12-04T08:57:06.2341455Z  * [new branch]              gh/tugsbayasgalan/61/orig   -> origin/gh/tugsbayasgalan/61/orig
2025-12-04T08:57:06.2343815Z  * [new branch]              gh/tugsbayasgalan/63/base   -> origin/gh/tugsbayasgalan/63/base
2025-12-04T08:57:06.2345393Z  * [new branch]              gh/tugsbayasgalan/63/head   -> origin/gh/tugsbayasgalan/63/head
2025-12-04T08:57:06.2347048Z  * [new branch]              gh/tugsbayasgalan/63/orig   -> origin/gh/tugsbayasgalan/63/orig
2025-12-04T08:57:06.2349199Z  * [new branch]              gh/tugsbayasgalan/67/base   -> origin/gh/tugsbayasgalan/67/base
2025-12-04T08:57:06.2350765Z  * [new branch]              gh/tugsbayasgalan/67/head   -> origin/gh/tugsbayasgalan/67/head
2025-12-04T08:57:06.2352398Z  * [new branch]              gh/tugsbayasgalan/67/orig   -> origin/gh/tugsbayasgalan/67/orig
2025-12-04T08:57:06.2354706Z  * [new branch]              gh/tugsbayasgalan/68/base   -> origin/gh/tugsbayasgalan/68/base
2025-12-04T08:57:06.2356281Z  * [new branch]              gh/tugsbayasgalan/68/head   -> origin/gh/tugsbayasgalan/68/head
2025-12-04T08:57:06.2357868Z  * [new branch]              gh/tugsbayasgalan/68/orig   -> origin/gh/tugsbayasgalan/68/orig
2025-12-04T08:57:06.2360171Z  * [new branch]              gh/tugsbayasgalan/7/base    -> origin/gh/tugsbayasgalan/7/base
2025-12-04T08:57:06.2361801Z  * [new branch]              gh/tugsbayasgalan/7/head    -> origin/gh/tugsbayasgalan/7/head
2025-12-04T08:57:06.2363399Z  * [new branch]              gh/tugsbayasgalan/7/orig    -> origin/gh/tugsbayasgalan/7/orig
2025-12-04T08:57:06.2366041Z  * [new branch]              gh/tugsbayasgalan/70/base   -> origin/gh/tugsbayasgalan/70/base
2025-12-04T08:57:06.2367644Z  * [new branch]              gh/tugsbayasgalan/70/head   -> origin/gh/tugsbayasgalan/70/head
2025-12-04T08:57:06.2369197Z  * [new branch]              gh/tugsbayasgalan/70/orig   -> origin/gh/tugsbayasgalan/70/orig
2025-12-04T08:57:06.2371753Z  * [new branch]              gh/tugsbayasgalan/71/base   -> origin/gh/tugsbayasgalan/71/base
2025-12-04T08:57:06.2373627Z  * [new branch]              gh/tugsbayasgalan/71/head   -> origin/gh/tugsbayasgalan/71/head
2025-12-04T08:57:06.2375324Z  * [new branch]              gh/tugsbayasgalan/71/orig   -> origin/gh/tugsbayasgalan/71/orig
2025-12-04T08:57:06.2377556Z  * [new branch]              gh/tugsbayasgalan/72/base   -> origin/gh/tugsbayasgalan/72/base
2025-12-04T08:57:06.2379180Z  * [new branch]              gh/tugsbayasgalan/72/head   -> origin/gh/tugsbayasgalan/72/head
2025-12-04T08:57:06.2380754Z  * [new branch]              gh/tugsbayasgalan/72/orig   -> origin/gh/tugsbayasgalan/72/orig
2025-12-04T08:57:06.2383267Z  * [new branch]              gh/tugsbayasgalan/73/base   -> origin/gh/tugsbayasgalan/73/base
2025-12-04T08:57:06.2384854Z  * [new branch]              gh/tugsbayasgalan/73/head   -> origin/gh/tugsbayasgalan/73/head
2025-12-04T08:57:06.2386491Z  * [new branch]              gh/tugsbayasgalan/73/orig   -> origin/gh/tugsbayasgalan/73/orig
2025-12-04T08:57:06.2389384Z  * [new branch]              gh/tugsbayasgalan/74/base   -> origin/gh/tugsbayasgalan/74/base
2025-12-04T08:57:06.2404259Z  * [new branch]              gh/tugsbayasgalan/74/head   -> origin/gh/tugsbayasgalan/74/head
2025-12-04T08:57:06.2404465Z  * [new branch]              gh/tugsbayasgalan/74/orig   -> origin/gh/tugsbayasgalan/74/orig
2025-12-04T08:57:06.2404641Z  * [new branch]              gh/tugsbayasgalan/75/base   -> origin/gh/tugsbayasgalan/75/base
2025-12-04T08:57:06.2404821Z  * [new branch]              gh/tugsbayasgalan/75/head   -> origin/gh/tugsbayasgalan/75/head
2025-12-04T08:57:06.2404996Z  * [new branch]              gh/tugsbayasgalan/75/orig   -> origin/gh/tugsbayasgalan/75/orig
2025-12-04T08:57:06.2405172Z  * [new branch]              gh/tugsbayasgalan/76/base   -> origin/gh/tugsbayasgalan/76/base
2025-12-04T08:57:06.2405470Z  * [new branch]              gh/tugsbayasgalan/76/head   -> origin/gh/tugsbayasgalan/76/head
2025-12-04T08:57:06.2405767Z  * [new branch]              gh/tugsbayasgalan/76/orig   -> origin/gh/tugsbayasgalan/76/orig
2025-12-04T08:57:06.2405946Z  * [new branch]              gh/tugsbayasgalan/77/base   -> origin/gh/tugsbayasgalan/77/base
2025-12-04T08:57:06.2407665Z  * [new branch]              gh/tugsbayasgalan/77/head   -> origin/gh/tugsbayasgalan/77/head
2025-12-04T08:57:06.2409372Z  * [new branch]              gh/tugsbayasgalan/77/orig   -> origin/gh/tugsbayasgalan/77/orig
2025-12-04T08:57:06.2411737Z  * [new branch]              gh/tugsbayasgalan/78/base   -> origin/gh/tugsbayasgalan/78/base
2025-12-04T08:57:06.2413534Z  * [new branch]              gh/tugsbayasgalan/78/head   -> origin/gh/tugsbayasgalan/78/head
2025-12-04T08:57:06.2415088Z  * [new branch]              gh/tugsbayasgalan/78/orig   -> origin/gh/tugsbayasgalan/78/orig
2025-12-04T08:57:06.2417415Z  * [new branch]              gh/tugsbayasgalan/79/base   -> origin/gh/tugsbayasgalan/79/base
2025-12-04T08:57:06.2420371Z  * [new branch]              gh/tugsbayasgalan/79/head   -> origin/gh/tugsbayasgalan/79/head
2025-12-04T08:57:06.2421906Z  * [new branch]              gh/tugsbayasgalan/79/orig   -> origin/gh/tugsbayasgalan/79/orig
2025-12-04T08:57:06.2424091Z  * [new branch]              gh/tugsbayasgalan/8/base    -> origin/gh/tugsbayasgalan/8/base
2025-12-04T08:57:06.2425632Z  * [new branch]              gh/tugsbayasgalan/8/head    -> origin/gh/tugsbayasgalan/8/head
2025-12-04T08:57:06.2427253Z  * [new branch]              gh/tugsbayasgalan/8/orig    -> origin/gh/tugsbayasgalan/8/orig
2025-12-04T08:57:06.2429512Z  * [new branch]              gh/tugsbayasgalan/80/base   -> origin/gh/tugsbayasgalan/80/base
2025-12-04T08:57:06.2430945Z  * [new branch]              gh/tugsbayasgalan/80/head   -> origin/gh/tugsbayasgalan/80/head
2025-12-04T08:57:06.2432453Z  * [new branch]              gh/tugsbayasgalan/80/orig   -> origin/gh/tugsbayasgalan/80/orig
2025-12-04T08:57:06.2435128Z  * [new branch]              gh/tugsbayasgalan/81/base   -> origin/gh/tugsbayasgalan/81/base
2025-12-04T08:57:06.2436646Z  * [new branch]              gh/tugsbayasgalan/81/head   -> origin/gh/tugsbayasgalan/81/head
2025-12-04T08:57:06.2438221Z  * [new branch]              gh/tugsbayasgalan/81/orig   -> origin/gh/tugsbayasgalan/81/orig
2025-12-04T08:57:06.2441424Z  * [new branch]              gh/tugsbayasgalan/82/base   -> origin/gh/tugsbayasgalan/82/base
2025-12-04T08:57:06.2443184Z  * [new branch]              gh/tugsbayasgalan/82/head   -> origin/gh/tugsbayasgalan/82/head
2025-12-04T08:57:06.2444771Z  * [new branch]              gh/tugsbayasgalan/82/orig   -> origin/gh/tugsbayasgalan/82/orig
2025-12-04T08:57:06.2446839Z  * [new branch]              gh/tugsbayasgalan/83/base   -> origin/gh/tugsbayasgalan/83/base
2025-12-04T08:57:06.2448526Z  * [new branch]              gh/tugsbayasgalan/83/head   -> origin/gh/tugsbayasgalan/83/head
2025-12-04T08:57:06.2450239Z  * [new branch]              gh/tugsbayasgalan/83/orig   -> origin/gh/tugsbayasgalan/83/orig
2025-12-04T08:57:06.2452231Z  * [new branch]              gh/tugsbayasgalan/84/base   -> origin/gh/tugsbayasgalan/84/base
2025-12-04T08:57:06.2453835Z  * [new branch]              gh/tugsbayasgalan/84/head   -> origin/gh/tugsbayasgalan/84/head
2025-12-04T08:57:06.2455409Z  * [new branch]              gh/tugsbayasgalan/84/orig   -> origin/gh/tugsbayasgalan/84/orig
2025-12-04T08:57:06.2457573Z  * [new branch]              gh/tugsbayasgalan/85/base   -> origin/gh/tugsbayasgalan/85/base
2025-12-04T08:57:06.2459117Z  * [new branch]              gh/tugsbayasgalan/85/head   -> origin/gh/tugsbayasgalan/85/head
2025-12-04T08:57:06.2460774Z  * [new branch]              gh/tugsbayasgalan/85/orig   -> origin/gh/tugsbayasgalan/85/orig
2025-12-04T08:57:06.2462883Z  * [new branch]              gh/tugsbayasgalan/86/base   -> origin/gh/tugsbayasgalan/86/base
2025-12-04T08:57:06.2464552Z  * [new branch]              gh/tugsbayasgalan/86/head   -> origin/gh/tugsbayasgalan/86/head
2025-12-04T08:57:06.2466089Z  * [new branch]              gh/tugsbayasgalan/86/orig   -> origin/gh/tugsbayasgalan/86/orig
2025-12-04T08:57:06.2468492Z  * [new branch]              gh/tugsbayasgalan/87/base   -> origin/gh/tugsbayasgalan/87/base
2025-12-04T08:57:06.2470493Z  * [new branch]              gh/tugsbayasgalan/87/head   -> origin/gh/tugsbayasgalan/87/head
2025-12-04T08:57:06.2472101Z  * [new branch]              gh/tugsbayasgalan/87/orig   -> origin/gh/tugsbayasgalan/87/orig
2025-12-04T08:57:06.2474355Z  * [new branch]              gh/tugsbayasgalan/88/base   -> origin/gh/tugsbayasgalan/88/base
2025-12-04T08:57:06.2475884Z  * [new branch]              gh/tugsbayasgalan/88/head   -> origin/gh/tugsbayasgalan/88/head
2025-12-04T08:57:06.2477604Z  * [new branch]              gh/tugsbayasgalan/88/orig   -> origin/gh/tugsbayasgalan/88/orig
2025-12-04T08:57:06.2480011Z  * [new branch]              gh/tugsbayasgalan/89/base   -> origin/gh/tugsbayasgalan/89/base
2025-12-04T08:57:06.2481738Z  * [new branch]              gh/tugsbayasgalan/89/head   -> origin/gh/tugsbayasgalan/89/head
2025-12-04T08:57:06.2483305Z  * [new branch]              gh/tugsbayasgalan/89/orig   -> origin/gh/tugsbayasgalan/89/orig
2025-12-04T08:57:06.2485481Z  * [new branch]              gh/tugsbayasgalan/9/base    -> origin/gh/tugsbayasgalan/9/base
2025-12-04T08:57:06.2486934Z  * [new branch]              gh/tugsbayasgalan/9/head    -> origin/gh/tugsbayasgalan/9/head
2025-12-04T08:57:06.2488485Z  * [new branch]              gh/tugsbayasgalan/9/orig    -> origin/gh/tugsbayasgalan/9/orig
2025-12-04T08:57:06.2490660Z  * [new branch]              gh/tugsbayasgalan/90/base   -> origin/gh/tugsbayasgalan/90/base
2025-12-04T08:57:06.2492390Z  * [new branch]              gh/tugsbayasgalan/90/head   -> origin/gh/tugsbayasgalan/90/head
2025-12-04T08:57:06.2493822Z  * [new branch]              gh/tugsbayasgalan/90/orig   -> origin/gh/tugsbayasgalan/90/orig
2025-12-04T08:57:06.2496494Z  * [new branch]              gh/tugsbayasgalan/91/base   -> origin/gh/tugsbayasgalan/91/base
2025-12-04T08:57:06.2498096Z  * [new branch]              gh/tugsbayasgalan/91/head   -> origin/gh/tugsbayasgalan/91/head
2025-12-04T08:57:06.2499715Z  * [new branch]              gh/tugsbayasgalan/91/orig   -> origin/gh/tugsbayasgalan/91/orig
2025-12-04T08:57:06.2502026Z  * [new branch]              gh/tugsbayasgalan/92/base   -> origin/gh/tugsbayasgalan/92/base
2025-12-04T08:57:06.2503680Z  * [new branch]              gh/tugsbayasgalan/92/head   -> origin/gh/tugsbayasgalan/92/head
2025-12-04T08:57:06.2505222Z  * [new branch]              gh/tugsbayasgalan/92/orig   -> origin/gh/tugsbayasgalan/92/orig
2025-12-04T08:57:06.2507935Z  * [new branch]              gh/tugsbayasgalan/93/base   -> origin/gh/tugsbayasgalan/93/base
2025-12-04T08:57:06.2509644Z  * [new branch]              gh/tugsbayasgalan/93/head   -> origin/gh/tugsbayasgalan/93/head
2025-12-04T08:57:06.2511220Z  * [new branch]              gh/tugsbayasgalan/93/orig   -> origin/gh/tugsbayasgalan/93/orig
2025-12-04T08:57:06.2513884Z  * [new branch]              gh/v0i0/14/base             -> origin/gh/v0i0/14/base
2025-12-04T08:57:06.2515370Z  * [new branch]              gh/v0i0/14/head             -> origin/gh/v0i0/14/head
2025-12-04T08:57:06.2516885Z  * [new branch]              gh/v0i0/14/orig             -> origin/gh/v0i0/14/orig
2025-12-04T08:57:06.2519120Z  * [new branch]              gh/v0i0/15/base             -> origin/gh/v0i0/15/base
2025-12-04T08:57:06.2520794Z  * [new branch]              gh/v0i0/15/head             -> origin/gh/v0i0/15/head
2025-12-04T08:57:06.2522394Z  * [new branch]              gh/v0i0/15/orig             -> origin/gh/v0i0/15/orig
2025-12-04T08:57:06.2524570Z  * [new branch]              gh/v0i0/16/base             -> origin/gh/v0i0/16/base
2025-12-04T08:57:06.2526182Z  * [new branch]              gh/v0i0/16/head             -> origin/gh/v0i0/16/head
2025-12-04T08:57:06.2527767Z  * [new branch]              gh/v0i0/16/orig             -> origin/gh/v0i0/16/orig
2025-12-04T08:57:06.2529934Z  * [new branch]              gh/v0i0/17/base             -> origin/gh/v0i0/17/base
2025-12-04T08:57:06.2531491Z  * [new branch]              gh/v0i0/17/head             -> origin/gh/v0i0/17/head
2025-12-04T08:57:06.2533047Z  * [new branch]              gh/v0i0/17/orig             -> origin/gh/v0i0/17/orig
2025-12-04T08:57:06.2535199Z  * [new branch]              gh/v0i0/18/base             -> origin/gh/v0i0/18/base
2025-12-04T08:57:06.2536852Z  * [new branch]              gh/v0i0/18/head             -> origin/gh/v0i0/18/head
2025-12-04T08:57:06.2538447Z  * [new branch]              gh/v0i0/18/orig             -> origin/gh/v0i0/18/orig
2025-12-04T08:57:06.2540496Z  * [new branch]              gh/v0i0/19/base             -> origin/gh/v0i0/19/base
2025-12-04T08:57:06.2542095Z  * [new branch]              gh/v0i0/19/head             -> origin/gh/v0i0/19/head
2025-12-04T08:57:06.2543690Z  * [new branch]              gh/v0i0/19/orig             -> origin/gh/v0i0/19/orig
2025-12-04T08:57:06.2546372Z  * [new branch]              gh/vishal9-team/1/base      -> origin/gh/vishal9-team/1/base
2025-12-04T08:57:06.2548046Z  * [new branch]              gh/vishal9-team/1/head      -> origin/gh/vishal9-team/1/head
2025-12-04T08:57:06.2550136Z  * [new branch]              gh/vishal9-team/2/base      -> origin/gh/vishal9-team/2/base
2025-12-04T08:57:06.2551764Z  * [new branch]              gh/vishal9-team/2/head      -> origin/gh/vishal9-team/2/head
2025-12-04T08:57:06.2553369Z  * [new branch]              gh/vishal9-team/2/orig      -> origin/gh/vishal9-team/2/orig
2025-12-04T08:57:06.2555537Z  * [new branch]              gh/vishal9-team/3/base      -> origin/gh/vishal9-team/3/base
2025-12-04T08:57:06.2557343Z  * [new branch]              gh/vishal9-team/3/head      -> origin/gh/vishal9-team/3/head
2025-12-04T08:57:06.2558858Z  * [new branch]              gh/vishal9-team/3/orig      -> origin/gh/vishal9-team/3/orig
2025-12-04T08:57:06.2560903Z  * [new branch]              gh/vishal9-team/4/base      -> origin/gh/vishal9-team/4/base
2025-12-04T08:57:06.2562451Z  * [new branch]              gh/vishal9-team/4/head      -> origin/gh/vishal9-team/4/head
2025-12-04T08:57:06.2564041Z  * [new branch]              gh/vishal9-team/4/orig      -> origin/gh/vishal9-team/4/orig
2025-12-04T08:57:06.2566623Z  * [new branch]              gh/vkuzo/1/next             -> origin/gh/vkuzo/1/next
2025-12-04T08:57:06.2568714Z  * [new branch]              gh/vkuzo/2/next             -> origin/gh/vkuzo/2/next
2025-12-04T08:57:06.2570812Z  * [new branch]              gh/vkuzo/3/next             -> origin/gh/vkuzo/3/next
2025-12-04T08:57:06.2573407Z  * [new branch]              gh/wconstab/424/base        -> origin/gh/wconstab/424/base
2025-12-04T08:57:06.2575467Z  * [new branch]              gh/wconstab/424/head        -> origin/gh/wconstab/424/head
2025-12-04T08:57:06.2577097Z  * [new branch]              gh/wconstab/424/orig        -> origin/gh/wconstab/424/orig
2025-12-04T08:57:06.2579195Z  * [new branch]              gh/wconstab/435/base        -> origin/gh/wconstab/435/base
2025-12-04T08:57:06.2580919Z  * [new branch]              gh/wconstab/435/head        -> origin/gh/wconstab/435/head
2025-12-04T08:57:06.2582524Z  * [new branch]              gh/wconstab/435/orig        -> origin/gh/wconstab/435/orig
2025-12-04T08:57:06.2584701Z  * [new branch]              gh/wconstab/444/base        -> origin/gh/wconstab/444/base
2025-12-04T08:57:06.2586328Z  * [new branch]              gh/wconstab/444/head        -> origin/gh/wconstab/444/head
2025-12-04T08:57:06.2587940Z  * [new branch]              gh/wconstab/444/orig        -> origin/gh/wconstab/444/orig
2025-12-04T08:57:06.2590125Z  * [new branch]              gh/wconstab/447/base        -> origin/gh/wconstab/447/base
2025-12-04T08:57:06.2591699Z  * [new branch]              gh/wconstab/447/head        -> origin/gh/wconstab/447/head
2025-12-04T08:57:06.2593724Z  * [new branch]              gh/wconstab/447/orig        -> origin/gh/wconstab/447/orig
2025-12-04T08:57:06.2595885Z  * [new branch]              gh/wconstab/448/base        -> origin/gh/wconstab/448/base
2025-12-04T08:57:06.2597489Z  * [new branch]              gh/wconstab/448/head        -> origin/gh/wconstab/448/head
2025-12-04T08:57:06.2599222Z  * [new branch]              gh/wconstab/448/orig        -> origin/gh/wconstab/448/orig
2025-12-04T08:57:06.2601467Z  * [new branch]              gh/wconstab/449/base        -> origin/gh/wconstab/449/base
2025-12-04T08:57:06.2603057Z  * [new branch]              gh/wconstab/449/head        -> origin/gh/wconstab/449/head
2025-12-04T08:57:06.2604704Z  * [new branch]              gh/wconstab/449/orig        -> origin/gh/wconstab/449/orig
2025-12-04T08:57:06.2606793Z  * [new branch]              gh/wconstab/450/base        -> origin/gh/wconstab/450/base
2025-12-04T08:57:06.2608449Z  * [new branch]              gh/wconstab/450/head        -> origin/gh/wconstab/450/head
2025-12-04T08:57:06.2609952Z  * [new branch]              gh/wconstab/450/orig        -> origin/gh/wconstab/450/orig
2025-12-04T08:57:06.2611925Z  * [new branch]              gh/wconstab/451/base        -> origin/gh/wconstab/451/base
2025-12-04T08:57:06.2613598Z  * [new branch]              gh/wconstab/451/head        -> origin/gh/wconstab/451/head
2025-12-04T08:57:06.2615194Z  * [new branch]              gh/wconstab/451/orig        -> origin/gh/wconstab/451/orig
2025-12-04T08:57:06.2617569Z  * [new branch]              gh/wconstab/452/base        -> origin/gh/wconstab/452/base
2025-12-04T08:57:06.2619115Z  * [new branch]              gh/wconstab/452/head        -> origin/gh/wconstab/452/head
2025-12-04T08:57:06.2620624Z  * [new branch]              gh/wconstab/452/orig        -> origin/gh/wconstab/452/orig
2025-12-04T08:57:06.2622741Z  * [new branch]              gh/wconstab/453/base        -> origin/gh/wconstab/453/base
2025-12-04T08:57:06.2624385Z  * [new branch]              gh/wconstab/453/head        -> origin/gh/wconstab/453/head
2025-12-04T08:57:06.2626192Z  * [new branch]              gh/wconstab/453/orig        -> origin/gh/wconstab/453/orig
2025-12-04T08:57:06.2628314Z  * [new branch]              gh/wconstab/454/base        -> origin/gh/wconstab/454/base
2025-12-04T08:57:06.2629969Z  * [new branch]              gh/wconstab/454/head        -> origin/gh/wconstab/454/head
2025-12-04T08:57:06.2632893Z  * [new branch]              gh/wconstab/454/orig        -> origin/gh/wconstab/454/orig
2025-12-04T08:57:06.2634319Z  * [new branch]              gh/wconstab/455/base        -> origin/gh/wconstab/455/base
2025-12-04T08:57:06.2635987Z  * [new branch]              gh/wconstab/455/head        -> origin/gh/wconstab/455/head
2025-12-04T08:57:06.2637483Z  * [new branch]              gh/wconstab/455/orig        -> origin/gh/wconstab/455/orig
2025-12-04T08:57:06.2639798Z  * [new branch]              gh/wconstab/456/base        -> origin/gh/wconstab/456/base
2025-12-04T08:57:06.2641741Z  * [new branch]              gh/wconstab/456/head        -> origin/gh/wconstab/456/head
2025-12-04T08:57:06.2643442Z  * [new branch]              gh/wconstab/456/orig        -> origin/gh/wconstab/456/orig
2025-12-04T08:57:06.2645514Z  * [new branch]              gh/wconstab/457/base        -> origin/gh/wconstab/457/base
2025-12-04T08:57:06.2647071Z  * [new branch]              gh/wconstab/457/head        -> origin/gh/wconstab/457/head
2025-12-04T08:57:06.2648723Z  * [new branch]              gh/wconstab/457/orig        -> origin/gh/wconstab/457/orig
2025-12-04T08:57:06.2650922Z  * [new branch]              gh/wconstab/458/base        -> origin/gh/wconstab/458/base
2025-12-04T08:57:06.2652551Z  * [new branch]              gh/wconstab/458/head        -> origin/gh/wconstab/458/head
2025-12-04T08:57:06.2654133Z  * [new branch]              gh/wconstab/458/orig        -> origin/gh/wconstab/458/orig
2025-12-04T08:57:06.2656180Z  * [new branch]              gh/wconstab/459/base        -> origin/gh/wconstab/459/base
2025-12-04T08:57:06.2657806Z  * [new branch]              gh/wconstab/459/head        -> origin/gh/wconstab/459/head
2025-12-04T08:57:06.2659378Z  * [new branch]              gh/wconstab/459/orig        -> origin/gh/wconstab/459/orig
2025-12-04T08:57:06.2662054Z  * [new branch]              gh/wconstab/460/base        -> origin/gh/wconstab/460/base
2025-12-04T08:57:06.2663760Z  * [new branch]              gh/wconstab/460/head        -> origin/gh/wconstab/460/head
2025-12-04T08:57:06.2665461Z  * [new branch]              gh/wconstab/460/orig        -> origin/gh/wconstab/460/orig
2025-12-04T08:57:06.2667713Z  * [new branch]              gh/wconstab/461/base        -> origin/gh/wconstab/461/base
2025-12-04T08:57:06.2669268Z  * [new branch]              gh/wconstab/461/head        -> origin/gh/wconstab/461/head
2025-12-04T08:57:06.2670821Z  * [new branch]              gh/wconstab/461/orig        -> origin/gh/wconstab/461/orig
2025-12-04T08:57:06.2672967Z  * [new branch]              gh/wconstab/462/base        -> origin/gh/wconstab/462/base
2025-12-04T08:57:06.2674617Z  * [new branch]              gh/wconstab/462/head        -> origin/gh/wconstab/462/head
2025-12-04T08:57:06.2676266Z  * [new branch]              gh/wconstab/462/orig        -> origin/gh/wconstab/462/orig
2025-12-04T08:57:06.2678495Z  * [new branch]              gh/wconstab/463/base        -> origin/gh/wconstab/463/base
2025-12-04T08:57:06.2680158Z  * [new branch]              gh/wconstab/463/head        -> origin/gh/wconstab/463/head
2025-12-04T08:57:06.2681801Z  * [new branch]              gh/wconstab/463/orig        -> origin/gh/wconstab/463/orig
2025-12-04T08:57:06.2683975Z  * [new branch]              gh/wconstab/464/base        -> origin/gh/wconstab/464/base
2025-12-04T08:57:06.2685623Z  * [new branch]              gh/wconstab/464/head        -> origin/gh/wconstab/464/head
2025-12-04T08:57:06.2687380Z  * [new branch]              gh/wconstab/464/orig        -> origin/gh/wconstab/464/orig
2025-12-04T08:57:06.2689479Z  * [new branch]              gh/wconstab/465/base        -> origin/gh/wconstab/465/base
2025-12-04T08:57:06.2691071Z  * [new branch]              gh/wconstab/465/head        -> origin/gh/wconstab/465/head
2025-12-04T08:57:06.2692987Z  * [new branch]              gh/wconstab/465/orig        -> origin/gh/wconstab/465/orig
2025-12-04T08:57:06.2695335Z  * [new branch]              gh/wconstab/466/base        -> origin/gh/wconstab/466/base
2025-12-04T08:57:06.2696852Z  * [new branch]              gh/wconstab/466/head        -> origin/gh/wconstab/466/head
2025-12-04T08:57:06.2698339Z  * [new branch]              gh/wconstab/466/orig        -> origin/gh/wconstab/466/orig
2025-12-04T08:57:06.2700921Z  * [new branch]              gh/wconstab/467/base        -> origin/gh/wconstab/467/base
2025-12-04T08:57:06.2702586Z  * [new branch]              gh/wconstab/467/head        -> origin/gh/wconstab/467/head
2025-12-04T08:57:06.2704216Z  * [new branch]              gh/wconstab/467/orig        -> origin/gh/wconstab/467/orig
2025-12-04T08:57:06.2706254Z  * [new branch]              gh/wconstab/468/base        -> origin/gh/wconstab/468/base
2025-12-04T08:57:06.2707816Z  * [new branch]              gh/wconstab/468/head        -> origin/gh/wconstab/468/head
2025-12-04T08:57:06.2709368Z  * [new branch]              gh/wconstab/468/orig        -> origin/gh/wconstab/468/orig
2025-12-04T08:57:06.2711977Z  * [new branch]              gh/weifengpy/39/base        -> origin/gh/weifengpy/39/base
2025-12-04T08:57:06.2713509Z  * [new branch]              gh/weifengpy/39/head        -> origin/gh/weifengpy/39/head
2025-12-04T08:57:06.2715214Z  * [new branch]              gh/weifengpy/39/orig        -> origin/gh/weifengpy/39/orig
2025-12-04T08:57:06.2717730Z  * [new branch]              gh/weifengpy/40/base        -> origin/gh/weifengpy/40/base
2025-12-04T08:57:06.2719268Z  * [new branch]              gh/weifengpy/40/head        -> origin/gh/weifengpy/40/head
2025-12-04T08:57:06.2721066Z  * [new branch]              gh/weifengpy/40/orig        -> origin/gh/weifengpy/40/orig
2025-12-04T08:57:06.2723239Z  * [new branch]              gh/weifengpy/41/base        -> origin/gh/weifengpy/41/base
2025-12-04T08:57:06.2724844Z  * [new branch]              gh/weifengpy/41/head        -> origin/gh/weifengpy/41/head
2025-12-04T08:57:06.2726566Z  * [new branch]              gh/weifengpy/41/orig        -> origin/gh/weifengpy/41/orig
2025-12-04T08:57:06.2729221Z  * [new branch]              gh/williamwen42/250/base    -> origin/gh/williamwen42/250/base
2025-12-04T08:57:06.2730815Z  * [new branch]              gh/williamwen42/250/head    -> origin/gh/williamwen42/250/head
2025-12-04T08:57:06.2732804Z  * [new branch]              gh/williamwen42/250/orig    -> origin/gh/williamwen42/250/orig
2025-12-04T08:57:06.2735043Z  * [new branch]              gh/williamwen42/279/base    -> origin/gh/williamwen42/279/base
2025-12-04T08:57:06.2736776Z  * [new branch]              gh/williamwen42/279/head    -> origin/gh/williamwen42/279/head
2025-12-04T08:57:06.2738353Z  * [new branch]              gh/williamwen42/279/orig    -> origin/gh/williamwen42/279/orig
2025-12-04T08:57:06.2740497Z  * [new branch]              gh/williamwen42/282/base    -> origin/gh/williamwen42/282/base
2025-12-04T08:57:06.2742093Z  * [new branch]              gh/williamwen42/282/head    -> origin/gh/williamwen42/282/head
2025-12-04T08:57:06.2743706Z  * [new branch]              gh/williamwen42/282/orig    -> origin/gh/williamwen42/282/orig
2025-12-04T08:57:06.2745875Z  * [new branch]              gh/williamwen42/287/base    -> origin/gh/williamwen42/287/base
2025-12-04T08:57:06.2747504Z  * [new branch]              gh/williamwen42/287/head    -> origin/gh/williamwen42/287/head
2025-12-04T08:57:06.2749126Z  * [new branch]              gh/williamwen42/287/orig    -> origin/gh/williamwen42/287/orig
2025-12-04T08:57:06.2751335Z  * [new branch]              gh/williamwen42/288/base    -> origin/gh/williamwen42/288/base
2025-12-04T08:57:06.2753134Z  * [new branch]              gh/williamwen42/288/head    -> origin/gh/williamwen42/288/head
2025-12-04T08:57:06.2754510Z  * [new branch]              gh/williamwen42/288/orig    -> origin/gh/williamwen42/288/orig
2025-12-04T08:57:06.2756833Z  * [new branch]              gh/williamwen42/296/base    -> origin/gh/williamwen42/296/base
2025-12-04T08:57:06.2758565Z  * [new branch]              gh/williamwen42/296/head    -> origin/gh/williamwen42/296/head
2025-12-04T08:57:06.2760176Z  * [new branch]              gh/williamwen42/296/orig    -> origin/gh/williamwen42/296/orig
2025-12-04T08:57:06.2762332Z  * [new branch]              gh/williamwen42/297/base    -> origin/gh/williamwen42/297/base
2025-12-04T08:57:06.2763884Z  * [new branch]              gh/williamwen42/297/head    -> origin/gh/williamwen42/297/head
2025-12-04T08:57:06.2765547Z  * [new branch]              gh/williamwen42/297/orig    -> origin/gh/williamwen42/297/orig
2025-12-04T08:57:06.2767705Z  * [new branch]              gh/williamwen42/306/base    -> origin/gh/williamwen42/306/base
2025-12-04T08:57:06.2769439Z  * [new branch]              gh/williamwen42/306/head    -> origin/gh/williamwen42/306/head
2025-12-04T08:57:06.2771018Z  * [new branch]              gh/williamwen42/306/orig    -> origin/gh/williamwen42/306/orig
2025-12-04T08:57:06.2773235Z  * [new branch]              gh/williamwen42/309/base    -> origin/gh/williamwen42/309/base
2025-12-04T08:57:06.2774861Z  * [new branch]              gh/williamwen42/309/head    -> origin/gh/williamwen42/309/head
2025-12-04T08:57:06.2776437Z  * [new branch]              gh/williamwen42/309/orig    -> origin/gh/williamwen42/309/orig
2025-12-04T08:57:06.2778515Z  * [new branch]              gh/williamwen42/310/base    -> origin/gh/williamwen42/310/base
2025-12-04T08:57:06.2780158Z  * [new branch]              gh/williamwen42/310/head    -> origin/gh/williamwen42/310/head
2025-12-04T08:57:06.2781743Z  * [new branch]              gh/williamwen42/310/orig    -> origin/gh/williamwen42/310/orig
2025-12-04T08:57:06.2784899Z  * [new branch]              gh/williamwen42/311/base    -> origin/gh/williamwen42/311/base
2025-12-04T08:57:06.2786475Z  * [new branch]              gh/williamwen42/311/head    -> origin/gh/williamwen42/311/head
2025-12-04T08:57:06.2788051Z  * [new branch]              gh/williamwen42/311/orig    -> origin/gh/williamwen42/311/orig
2025-12-04T08:57:06.2790210Z  * [new branch]              gh/williamwen42/319/base    -> origin/gh/williamwen42/319/base
2025-12-04T08:57:06.2791804Z  * [new branch]              gh/williamwen42/319/head    -> origin/gh/williamwen42/319/head
2025-12-04T08:57:06.2793396Z  * [new branch]              gh/williamwen42/319/orig    -> origin/gh/williamwen42/319/orig
2025-12-04T08:57:06.2795641Z  * [new branch]              gh/williamwen42/325/base    -> origin/gh/williamwen42/325/base
2025-12-04T08:57:06.2797257Z  * [new branch]              gh/williamwen42/325/head    -> origin/gh/williamwen42/325/head
2025-12-04T08:57:06.2798900Z  * [new branch]              gh/williamwen42/325/orig    -> origin/gh/williamwen42/325/orig
2025-12-04T08:57:06.2801210Z  * [new branch]              gh/williamwen42/326/base    -> origin/gh/williamwen42/326/base
2025-12-04T08:57:06.2803343Z  * [new branch]              gh/williamwen42/326/head    -> origin/gh/williamwen42/326/head
2025-12-04T08:57:06.2804949Z  * [new branch]              gh/williamwen42/326/orig    -> origin/gh/williamwen42/326/orig
2025-12-04T08:57:06.2807203Z  * [new branch]              gh/williamwen42/327/base    -> origin/gh/williamwen42/327/base
2025-12-04T08:57:06.2808776Z  * [new branch]              gh/williamwen42/327/head    -> origin/gh/williamwen42/327/head
2025-12-04T08:57:06.2810277Z  * [new branch]              gh/williamwen42/327/orig    -> origin/gh/williamwen42/327/orig
2025-12-04T08:57:06.2812471Z  * [new branch]              gh/williamwen42/328/base    -> origin/gh/williamwen42/328/base
2025-12-04T08:57:06.2814096Z  * [new branch]              gh/williamwen42/328/head    -> origin/gh/williamwen42/328/head
2025-12-04T08:57:06.2815775Z  * [new branch]              gh/williamwen42/328/orig    -> origin/gh/williamwen42/328/orig
2025-12-04T08:57:06.2819958Z  * [new branch]              gh/williamwen42/329/base    -> origin/gh/williamwen42/329/base
2025-12-04T08:57:06.2821569Z  * [new branch]              gh/williamwen42/329/head    -> origin/gh/williamwen42/329/head
2025-12-04T08:57:06.2823296Z  * [new branch]              gh/williamwen42/329/orig    -> origin/gh/williamwen42/329/orig
2025-12-04T08:57:06.2825527Z  * [new branch]              gh/williamwen42/330/base    -> origin/gh/williamwen42/330/base
2025-12-04T08:57:06.2827206Z  * [new branch]              gh/williamwen42/330/head    -> origin/gh/williamwen42/330/head
2025-12-04T08:57:06.2828793Z  * [new branch]              gh/williamwen42/330/orig    -> origin/gh/williamwen42/330/orig
2025-12-04T08:57:06.2831018Z  * [new branch]              gh/williamwen42/331/base    -> origin/gh/williamwen42/331/base
2025-12-04T08:57:06.2832705Z  * [new branch]              gh/williamwen42/331/head    -> origin/gh/williamwen42/331/head
2025-12-04T08:57:06.2834312Z  * [new branch]              gh/williamwen42/331/orig    -> origin/gh/williamwen42/331/orig
2025-12-04T08:57:06.2836424Z  * [new branch]              gh/williamwen42/332/base    -> origin/gh/williamwen42/332/base
2025-12-04T08:57:06.2838048Z  * [new branch]              gh/williamwen42/332/head    -> origin/gh/williamwen42/332/head
2025-12-04T08:57:06.2839603Z  * [new branch]              gh/williamwen42/332/orig    -> origin/gh/williamwen42/332/orig
2025-12-04T08:57:06.2842104Z  * [new branch]              gh/williamwen42/333/base    -> origin/gh/williamwen42/333/base
2025-12-04T08:57:06.2843675Z  * [new branch]              gh/williamwen42/333/head    -> origin/gh/williamwen42/333/head
2025-12-04T08:57:06.2845221Z  * [new branch]              gh/williamwen42/333/orig    -> origin/gh/williamwen42/333/orig
2025-12-04T08:57:06.2847404Z  * [new branch]              gh/williamwen42/334/base    -> origin/gh/williamwen42/334/base
2025-12-04T08:57:06.2849180Z  * [new branch]              gh/williamwen42/334/head    -> origin/gh/williamwen42/334/head
2025-12-04T08:57:06.2850778Z  * [new branch]              gh/williamwen42/334/orig    -> origin/gh/williamwen42/334/orig
2025-12-04T08:57:06.2853522Z  * [new branch]              gh/williamwen42/335/base    -> origin/gh/williamwen42/335/base
2025-12-04T08:57:06.2858080Z  * [new branch]              gh/williamwen42/335/head    -> origin/gh/williamwen42/335/head
2025-12-04T08:57:06.2859788Z  * [new branch]              gh/williamwen42/335/orig    -> origin/gh/williamwen42/335/orig
2025-12-04T08:57:06.2862027Z  * [new branch]              gh/williamwen42/336/base    -> origin/gh/williamwen42/336/base
2025-12-04T08:57:06.2863555Z  * [new branch]              gh/williamwen42/336/head    -> origin/gh/williamwen42/336/head
2025-12-04T08:57:06.2865098Z  * [new branch]              gh/williamwen42/336/orig    -> origin/gh/williamwen42/336/orig
2025-12-04T08:57:06.2867317Z  * [new branch]              gh/williamwen42/337/base    -> origin/gh/williamwen42/337/base
2025-12-04T08:57:06.2868885Z  * [new branch]              gh/williamwen42/337/head    -> origin/gh/williamwen42/337/head
2025-12-04T08:57:06.2870369Z  * [new branch]              gh/williamwen42/337/orig    -> origin/gh/williamwen42/337/orig
2025-12-04T08:57:06.2872629Z  * [new branch]              gh/williamwen42/338/base    -> origin/gh/williamwen42/338/base
2025-12-04T08:57:06.2874298Z  * [new branch]              gh/williamwen42/338/head    -> origin/gh/williamwen42/338/head
2025-12-04T08:57:06.2875902Z  * [new branch]              gh/williamwen42/338/orig    -> origin/gh/williamwen42/338/orig
2025-12-04T08:57:06.2878106Z  * [new branch]              gh/williamwen42/339/base    -> origin/gh/williamwen42/339/base
2025-12-04T08:57:06.2879700Z  * [new branch]              gh/williamwen42/339/head    -> origin/gh/williamwen42/339/head
2025-12-04T08:57:06.2881419Z  * [new branch]              gh/williamwen42/339/orig    -> origin/gh/williamwen42/339/orig
2025-12-04T08:57:06.2883858Z  * [new branch]              gh/williamwen42/340/base    -> origin/gh/williamwen42/340/base
2025-12-04T08:57:06.2885274Z  * [new branch]              gh/williamwen42/340/head    -> origin/gh/williamwen42/340/head
2025-12-04T08:57:06.2886756Z  * [new branch]              gh/williamwen42/340/orig    -> origin/gh/williamwen42/340/orig
2025-12-04T08:57:06.2889038Z  * [new branch]              gh/williamwen42/341/base    -> origin/gh/williamwen42/341/base
2025-12-04T08:57:06.2890646Z  * [new branch]              gh/williamwen42/341/head    -> origin/gh/williamwen42/341/head
2025-12-04T08:57:06.2892651Z  * [new branch]              gh/williamwen42/341/orig    -> origin/gh/williamwen42/341/orig
2025-12-04T08:57:06.2894818Z  * [new branch]              gh/williamwen42/342/base    -> origin/gh/williamwen42/342/base
2025-12-04T08:57:06.2896944Z  * [new branch]              gh/williamwen42/342/head    -> origin/gh/williamwen42/342/head
2025-12-04T08:57:06.2898553Z  * [new branch]              gh/williamwen42/342/orig    -> origin/gh/williamwen42/342/orig
2025-12-04T08:57:06.2900779Z  * [new branch]              gh/williamwen42/343/base    -> origin/gh/williamwen42/343/base
2025-12-04T08:57:06.2902371Z  * [new branch]              gh/williamwen42/343/head    -> origin/gh/williamwen42/343/head
2025-12-04T08:57:06.2903932Z  * [new branch]              gh/williamwen42/343/orig    -> origin/gh/williamwen42/343/orig
2025-12-04T08:57:06.2906156Z  * [new branch]              gh/williamwen42/344/base    -> origin/gh/williamwen42/344/base
2025-12-04T08:57:06.2907755Z  * [new branch]              gh/williamwen42/344/head    -> origin/gh/williamwen42/344/head
2025-12-04T08:57:06.2909412Z  * [new branch]              gh/williamwen42/344/orig    -> origin/gh/williamwen42/344/orig
2025-12-04T08:57:06.2911727Z  * [new branch]              gh/williamwen42/345/base    -> origin/gh/williamwen42/345/base
2025-12-04T08:57:06.2913319Z  * [new branch]              gh/williamwen42/345/head    -> origin/gh/williamwen42/345/head
2025-12-04T08:57:06.2914912Z  * [new branch]              gh/williamwen42/345/orig    -> origin/gh/williamwen42/345/orig
2025-12-04T08:57:06.2917255Z  * [new branch]              gh/williamwen42/346/base    -> origin/gh/williamwen42/346/base
2025-12-04T08:57:06.2919106Z  * [new branch]              gh/williamwen42/346/head    -> origin/gh/williamwen42/346/head
2025-12-04T08:57:06.2920706Z  * [new branch]              gh/williamwen42/346/orig    -> origin/gh/williamwen42/346/orig
2025-12-04T08:57:06.2922988Z  * [new branch]              gh/williamwen42/347/base    -> origin/gh/williamwen42/347/base
2025-12-04T08:57:06.2924521Z  * [new branch]              gh/williamwen42/347/head    -> origin/gh/williamwen42/347/head
2025-12-04T08:57:06.2926155Z  * [new branch]              gh/williamwen42/347/orig    -> origin/gh/williamwen42/347/orig
2025-12-04T08:57:06.2928240Z  * [new branch]              gh/williamwen42/348/base    -> origin/gh/williamwen42/348/base
2025-12-04T08:57:06.2929807Z  * [new branch]              gh/williamwen42/348/head    -> origin/gh/williamwen42/348/head
2025-12-04T08:57:06.2931584Z  * [new branch]              gh/williamwen42/348/orig    -> origin/gh/williamwen42/348/orig
2025-12-04T08:57:06.2933724Z  * [new branch]              gh/williamwen42/349/base    -> origin/gh/williamwen42/349/base
2025-12-04T08:57:06.2935279Z  * [new branch]              gh/williamwen42/349/head    -> origin/gh/williamwen42/349/head
2025-12-04T08:57:06.2936832Z  * [new branch]              gh/williamwen42/349/orig    -> origin/gh/williamwen42/349/orig
2025-12-04T08:57:06.2939085Z  * [new branch]              gh/williamwen42/350/base    -> origin/gh/williamwen42/350/base
2025-12-04T08:57:06.2940588Z  * [new branch]              gh/williamwen42/350/head    -> origin/gh/williamwen42/350/head
2025-12-04T08:57:06.2942178Z  * [new branch]              gh/williamwen42/350/orig    -> origin/gh/williamwen42/350/orig
2025-12-04T08:57:06.2944575Z  * [new branch]              gh/williamwen42/351/base    -> origin/gh/williamwen42/351/base
2025-12-04T08:57:06.2946143Z  * [new branch]              gh/williamwen42/351/head    -> origin/gh/williamwen42/351/head
2025-12-04T08:57:06.2947802Z  * [new branch]              gh/williamwen42/351/orig    -> origin/gh/williamwen42/351/orig
2025-12-04T08:57:06.2949934Z  * [new branch]              gh/williamwen42/352/base    -> origin/gh/williamwen42/352/base
2025-12-04T08:57:06.2951587Z  * [new branch]              gh/williamwen42/352/head    -> origin/gh/williamwen42/352/head
2025-12-04T08:57:06.2953123Z  * [new branch]              gh/williamwen42/352/orig    -> origin/gh/williamwen42/352/orig
2025-12-04T08:57:06.2955404Z  * [new branch]              gh/williamwen42/353/base    -> origin/gh/williamwen42/353/base
2025-12-04T08:57:06.2957038Z  * [new branch]              gh/williamwen42/353/head    -> origin/gh/williamwen42/353/head
2025-12-04T08:57:06.2958604Z  * [new branch]              gh/williamwen42/353/orig    -> origin/gh/williamwen42/353/orig
2025-12-04T08:57:06.2960812Z  * [new branch]              gh/williamwen42/354/base    -> origin/gh/williamwen42/354/base
2025-12-04T08:57:06.2962534Z  * [new branch]              gh/williamwen42/354/head    -> origin/gh/williamwen42/354/head
2025-12-04T08:57:06.2964186Z  * [new branch]              gh/williamwen42/354/orig    -> origin/gh/williamwen42/354/orig
2025-12-04T08:57:06.2966377Z  * [new branch]              gh/williamwen42/355/base    -> origin/gh/williamwen42/355/base
2025-12-04T08:57:06.2967958Z  * [new branch]              gh/williamwen42/355/head    -> origin/gh/williamwen42/355/head
2025-12-04T08:57:06.2969576Z  * [new branch]              gh/williamwen42/355/orig    -> origin/gh/williamwen42/355/orig
2025-12-04T08:57:06.2971754Z  * [new branch]              gh/williamwen42/356/base    -> origin/gh/williamwen42/356/base
2025-12-04T08:57:06.2973328Z  * [new branch]              gh/williamwen42/356/head    -> origin/gh/williamwen42/356/head
2025-12-04T08:57:06.2974896Z  * [new branch]              gh/williamwen42/356/orig    -> origin/gh/williamwen42/356/orig
2025-12-04T08:57:06.2977108Z  * [new branch]              gh/williamwen42/357/base    -> origin/gh/williamwen42/357/base
2025-12-04T08:57:06.2978746Z  * [new branch]              gh/williamwen42/357/head    -> origin/gh/williamwen42/357/head
2025-12-04T08:57:06.2980290Z  * [new branch]              gh/williamwen42/357/orig    -> origin/gh/williamwen42/357/orig
2025-12-04T08:57:06.2982499Z  * [new branch]              gh/williamwen42/358/base    -> origin/gh/williamwen42/358/base
2025-12-04T08:57:06.2984267Z  * [new branch]              gh/williamwen42/358/head    -> origin/gh/williamwen42/358/head
2025-12-04T08:57:06.2985885Z  * [new branch]              gh/williamwen42/358/orig    -> origin/gh/williamwen42/358/orig
2025-12-04T08:57:06.2988475Z  * [new branch]              gh/xmfan/169/base           -> origin/gh/xmfan/169/base
2025-12-04T08:57:06.2990147Z  * [new branch]              gh/xmfan/169/head           -> origin/gh/xmfan/169/head
2025-12-04T08:57:06.2992179Z  * [new branch]              gh/xmfan/170/base           -> origin/gh/xmfan/170/base
2025-12-04T08:57:06.2993684Z  * [new branch]              gh/xmfan/170/head           -> origin/gh/xmfan/170/head
2025-12-04T08:57:06.2995780Z  * [new branch]              gh/xmfan/274/base           -> origin/gh/xmfan/274/base
2025-12-04T08:57:06.2997368Z  * [new branch]              gh/xmfan/274/head           -> origin/gh/xmfan/274/head
2025-12-04T08:57:06.2998978Z  * [new branch]              gh/xmfan/274/orig           -> origin/gh/xmfan/274/orig
2025-12-04T08:57:06.3001619Z  * [new branch]              gh/xmfan/277/base           -> origin/gh/xmfan/277/base
2025-12-04T08:57:06.3003166Z  * [new branch]              gh/xmfan/277/head           -> origin/gh/xmfan/277/head
2025-12-04T08:57:06.3004849Z  * [new branch]              gh/xmfan/277/orig           -> origin/gh/xmfan/277/orig
2025-12-04T08:57:06.3006887Z  * [new branch]              gh/xmfan/301/base           -> origin/gh/xmfan/301/base
2025-12-04T08:57:06.3008562Z  * [new branch]              gh/xmfan/301/head           -> origin/gh/xmfan/301/head
2025-12-04T08:57:06.3010015Z  * [new branch]              gh/xmfan/301/orig           -> origin/gh/xmfan/301/orig
2025-12-04T08:57:06.3012119Z  * [new branch]              gh/xmfan/304/base           -> origin/gh/xmfan/304/base
2025-12-04T08:57:06.3013670Z  * [new branch]              gh/xmfan/304/head           -> origin/gh/xmfan/304/head
2025-12-04T08:57:06.3015238Z  * [new branch]              gh/xmfan/304/orig           -> origin/gh/xmfan/304/orig
2025-12-04T08:57:06.3017995Z  * [new branch]              gh/xmfan/309/base           -> origin/gh/xmfan/309/base
2025-12-04T08:57:06.3019657Z  * [new branch]              gh/xmfan/309/head           -> origin/gh/xmfan/309/head
2025-12-04T08:57:06.3021193Z  * [new branch]              gh/xmfan/309/orig           -> origin/gh/xmfan/309/orig
2025-12-04T08:57:06.3023287Z  * [new branch]              gh/xmfan/310/base           -> origin/gh/xmfan/310/base
2025-12-04T08:57:06.3024878Z  * [new branch]              gh/xmfan/310/head           -> origin/gh/xmfan/310/head
2025-12-04T08:57:06.3026577Z  * [new branch]              gh/xmfan/310/orig           -> origin/gh/xmfan/310/orig
2025-12-04T08:57:06.3028642Z  * [new branch]              gh/xmfan/311/base           -> origin/gh/xmfan/311/base
2025-12-04T08:57:06.3030257Z  * [new branch]              gh/xmfan/311/head           -> origin/gh/xmfan/311/head
2025-12-04T08:57:06.3031810Z  * [new branch]              gh/xmfan/311/orig           -> origin/gh/xmfan/311/orig
2025-12-04T08:57:06.3033910Z  * [new branch]              gh/xmfan/312/base           -> origin/gh/xmfan/312/base
2025-12-04T08:57:06.3035478Z  * [new branch]              gh/xmfan/312/head           -> origin/gh/xmfan/312/head
2025-12-04T08:57:06.3037097Z  * [new branch]              gh/xmfan/312/orig           -> origin/gh/xmfan/312/orig
2025-12-04T08:57:06.3039276Z  * [new branch]              gh/xmfan/313/base           -> origin/gh/xmfan/313/base
2025-12-04T08:57:06.3040964Z  * [new branch]              gh/xmfan/313/head           -> origin/gh/xmfan/313/head
2025-12-04T08:57:06.3042598Z  * [new branch]              gh/xmfan/313/orig           -> origin/gh/xmfan/313/orig
2025-12-04T08:57:06.3045157Z  * [new branch]              gh/xuanzhang816/27/base     -> origin/gh/xuanzhang816/27/base
2025-12-04T08:57:06.3046761Z  * [new branch]              gh/xuanzhang816/27/head     -> origin/gh/xuanzhang816/27/head
2025-12-04T08:57:06.3048354Z  * [new branch]              gh/xuanzhang816/27/orig     -> origin/gh/xuanzhang816/27/orig
2025-12-04T08:57:06.3050474Z  * [new branch]              gh/xuanzhang816/32/base     -> origin/gh/xuanzhang816/32/base
2025-12-04T08:57:06.3052014Z  * [new branch]              gh/xuanzhang816/32/head     -> origin/gh/xuanzhang816/32/head
2025-12-04T08:57:06.3053580Z  * [new branch]              gh/xuanzhang816/32/orig     -> origin/gh/xuanzhang816/32/orig
2025-12-04T08:57:06.3055662Z  * [new branch]              gh/xuanzhang816/33/base     -> origin/gh/xuanzhang816/33/base
2025-12-04T08:57:06.3057303Z  * [new branch]              gh/xuanzhang816/33/head     -> origin/gh/xuanzhang816/33/head
2025-12-04T08:57:06.3058961Z  * [new branch]              gh/xuanzhang816/33/orig     -> origin/gh/xuanzhang816/33/orig
2025-12-04T08:57:06.3061773Z  * [new branch]              gh/xuanzhang816/34/base     -> origin/gh/xuanzhang816/34/base
2025-12-04T08:57:06.3063379Z  * [new branch]              gh/xuanzhang816/34/head     -> origin/gh/xuanzhang816/34/head
2025-12-04T08:57:06.3064989Z  * [new branch]              gh/xuanzhang816/34/orig     -> origin/gh/xuanzhang816/34/orig
2025-12-04T08:57:06.3067345Z  * [new branch]              gh/xuanzhang816/35/base     -> origin/gh/xuanzhang816/35/base
2025-12-04T08:57:06.3068982Z  * [new branch]              gh/xuanzhang816/35/head     -> origin/gh/xuanzhang816/35/head
2025-12-04T08:57:06.3070573Z  * [new branch]              gh/xuanzhang816/35/orig     -> origin/gh/xuanzhang816/35/orig
2025-12-04T08:57:06.3073349Z  * [new branch]              gh/yanbing-j/11/base        -> origin/gh/yanbing-j/11/base
2025-12-04T08:57:06.3074800Z  * [new branch]              gh/yanbing-j/11/head        -> origin/gh/yanbing-j/11/head
2025-12-04T08:57:06.3076356Z  * [new branch]              gh/yanbing-j/11/orig        -> origin/gh/yanbing-j/11/orig
2025-12-04T08:57:06.3078432Z  * [new branch]              gh/yanbing-j/12/base        -> origin/gh/yanbing-j/12/base
2025-12-04T08:57:06.3080058Z  * [new branch]              gh/yanbing-j/12/head        -> origin/gh/yanbing-j/12/head
2025-12-04T08:57:06.3081688Z  * [new branch]              gh/yanbing-j/12/orig        -> origin/gh/yanbing-j/12/orig
2025-12-04T08:57:06.3083836Z  * [new branch]              gh/yanbing-j/13/base        -> origin/gh/yanbing-j/13/base
2025-12-04T08:57:06.3085434Z  * [new branch]              gh/yanbing-j/13/head        -> origin/gh/yanbing-j/13/head
2025-12-04T08:57:06.3087012Z  * [new branch]              gh/yanbing-j/13/orig        -> origin/gh/yanbing-j/13/orig
2025-12-04T08:57:06.3089183Z  * [new branch]              gh/yanbing-j/14/base        -> origin/gh/yanbing-j/14/base
2025-12-04T08:57:06.3090807Z  * [new branch]              gh/yanbing-j/14/head        -> origin/gh/yanbing-j/14/head
2025-12-04T08:57:06.3092762Z  * [new branch]              gh/yanbing-j/14/orig        -> origin/gh/yanbing-j/14/orig
2025-12-04T08:57:06.3094778Z  * [new branch]              gh/yanbing-j/15/base        -> origin/gh/yanbing-j/15/base
2025-12-04T08:57:06.3096353Z  * [new branch]              gh/yanbing-j/15/head        -> origin/gh/yanbing-j/15/head
2025-12-04T08:57:06.3097929Z  * [new branch]              gh/yanbing-j/15/orig        -> origin/gh/yanbing-j/15/orig
2025-12-04T08:57:06.3100005Z  * [new branch]              gh/yanbing-j/18/base        -> origin/gh/yanbing-j/18/base
2025-12-04T08:57:06.3101659Z  * [new branch]              gh/yanbing-j/18/head        -> origin/gh/yanbing-j/18/head
2025-12-04T08:57:06.3103279Z  * [new branch]              gh/yanbing-j/18/orig        -> origin/gh/yanbing-j/18/orig
2025-12-04T08:57:06.3105398Z  * [new branch]              gh/yanbing-j/19/base        -> origin/gh/yanbing-j/19/base
2025-12-04T08:57:06.3107420Z  * [new branch]              gh/yanbing-j/19/head        -> origin/gh/yanbing-j/19/head
2025-12-04T08:57:06.3109007Z  * [new branch]              gh/yanbing-j/19/orig        -> origin/gh/yanbing-j/19/orig
2025-12-04T08:57:06.3111198Z  * [new branch]              gh/yanbing-j/20/base        -> origin/gh/yanbing-j/20/base
2025-12-04T08:57:06.3112791Z  * [new branch]              gh/yanbing-j/20/head        -> origin/gh/yanbing-j/20/head
2025-12-04T08:57:06.3114382Z  * [new branch]              gh/yanbing-j/20/orig        -> origin/gh/yanbing-j/20/orig
2025-12-04T08:57:06.3116515Z  * [new branch]              gh/yanbing-j/21/base        -> origin/gh/yanbing-j/21/base
2025-12-04T08:57:06.3118396Z  * [new branch]              gh/yanbing-j/21/head        -> origin/gh/yanbing-j/21/head
2025-12-04T08:57:06.3120539Z  * [new branch]              gh/yanbing-j/22/base        -> origin/gh/yanbing-j/22/base
2025-12-04T08:57:06.3122061Z  * [new branch]              gh/yanbing-j/22/head        -> origin/gh/yanbing-j/22/head
2025-12-04T08:57:06.3123640Z  * [new branch]              gh/yanbing-j/22/orig        -> origin/gh/yanbing-j/22/orig
2025-12-04T08:57:06.3125922Z  * [new branch]              gh/yanbing-j/23/base        -> origin/gh/yanbing-j/23/base
2025-12-04T08:57:06.3127520Z  * [new branch]              gh/yanbing-j/23/head        -> origin/gh/yanbing-j/23/head
2025-12-04T08:57:06.3129223Z  * [new branch]              gh/yanbing-j/23/orig        -> origin/gh/yanbing-j/23/orig
2025-12-04T08:57:06.3131338Z  * [new branch]              gh/yanbing-j/24/base        -> origin/gh/yanbing-j/24/base
2025-12-04T08:57:06.3132995Z  * [new branch]              gh/yanbing-j/24/head        -> origin/gh/yanbing-j/24/head
2025-12-04T08:57:06.3134606Z  * [new branch]              gh/yanbing-j/24/orig        -> origin/gh/yanbing-j/24/orig
2025-12-04T08:57:06.3136868Z  * [new branch]              gh/yanbing-j/25/base        -> origin/gh/yanbing-j/25/base
2025-12-04T08:57:06.3138384Z  * [new branch]              gh/yanbing-j/25/head        -> origin/gh/yanbing-j/25/head
2025-12-04T08:57:06.3139848Z  * [new branch]              gh/yanbing-j/25/orig        -> origin/gh/yanbing-j/25/orig
2025-12-04T08:57:06.3141986Z  * [new branch]              gh/yanbing-j/26/base        -> origin/gh/yanbing-j/26/base
2025-12-04T08:57:06.3143526Z  * [new branch]              gh/yanbing-j/26/head        -> origin/gh/yanbing-j/26/head
2025-12-04T08:57:06.3145134Z  * [new branch]              gh/yanbing-j/26/orig        -> origin/gh/yanbing-j/26/orig
2025-12-04T08:57:06.3147845Z  * [new branch]              gh/yang-yu-hang/1/base      -> origin/gh/yang-yu-hang/1/base
2025-12-04T08:57:06.3149508Z  * [new branch]              gh/yang-yu-hang/1/head      -> origin/gh/yang-yu-hang/1/head
2025-12-04T08:57:06.3151205Z  * [new branch]              gh/yang-yu-hang/1/orig      -> origin/gh/yang-yu-hang/1/orig
2025-12-04T08:57:06.3153907Z  * [new branch]              gh/yang-yu-hang/2/base      -> origin/gh/yang-yu-hang/2/base
2025-12-04T08:57:06.3155757Z  * [new branch]              gh/yang-yu-hang/2/head      -> origin/gh/yang-yu-hang/2/head
2025-12-04T08:57:06.3157546Z  * [new branch]              gh/yang-yu-hang/2/orig      -> origin/gh/yang-yu-hang/2/orig
2025-12-04T08:57:06.3159742Z  * [new branch]              gh/yang-yu-hang/3/base      -> origin/gh/yang-yu-hang/3/base
2025-12-04T08:57:06.3161485Z  * [new branch]              gh/yang-yu-hang/3/head      -> origin/gh/yang-yu-hang/3/head
2025-12-04T08:57:06.3163101Z  * [new branch]              gh/yang-yu-hang/3/orig      -> origin/gh/yang-yu-hang/3/orig
2025-12-04T08:57:06.3165628Z  * [new branch]              gh/yangw-dev/12/base        -> origin/gh/yangw-dev/12/base
2025-12-04T08:57:06.3167211Z  * [new branch]              gh/yangw-dev/12/head        -> origin/gh/yangw-dev/12/head
2025-12-04T08:57:06.3168866Z  * [new branch]              gh/yangw-dev/12/orig        -> origin/gh/yangw-dev/12/orig
2025-12-04T08:57:06.3170910Z  * [new branch]              gh/yangw-dev/13/base        -> origin/gh/yangw-dev/13/base
2025-12-04T08:57:06.3172522Z  * [new branch]              gh/yangw-dev/13/head        -> origin/gh/yangw-dev/13/head
2025-12-04T08:57:06.3174209Z  * [new branch]              gh/yangw-dev/13/orig        -> origin/gh/yangw-dev/13/orig
2025-12-04T08:57:06.3176417Z  * [new branch]              gh/yangw-dev/14/base        -> origin/gh/yangw-dev/14/base
2025-12-04T08:57:06.3178079Z  * [new branch]              gh/yangw-dev/14/head        -> origin/gh/yangw-dev/14/head
2025-12-04T08:57:06.3179680Z  * [new branch]              gh/yangw-dev/14/orig        -> origin/gh/yangw-dev/14/orig
2025-12-04T08:57:06.3181813Z  * [new branch]              gh/yangw-dev/15/base        -> origin/gh/yangw-dev/15/base
2025-12-04T08:57:06.3183366Z  * [new branch]              gh/yangw-dev/15/head        -> origin/gh/yangw-dev/15/head
2025-12-04T08:57:06.3185022Z  * [new branch]              gh/yangw-dev/15/orig        -> origin/gh/yangw-dev/15/orig
2025-12-04T08:57:06.3187092Z  * [new branch]              gh/yangw-dev/19/base        -> origin/gh/yangw-dev/19/base
2025-12-04T08:57:06.3188677Z  * [new branch]              gh/yangw-dev/19/head        -> origin/gh/yangw-dev/19/head
2025-12-04T08:57:06.3190259Z  * [new branch]              gh/yangw-dev/19/orig        -> origin/gh/yangw-dev/19/orig
2025-12-04T08:57:06.3192410Z  * [new branch]              gh/yangw-dev/26/base        -> origin/gh/yangw-dev/26/base
2025-12-04T08:57:06.3194046Z  * [new branch]              gh/yangw-dev/26/head        -> origin/gh/yangw-dev/26/head
2025-12-04T08:57:06.3195596Z  * [new branch]              gh/yangw-dev/26/orig        -> origin/gh/yangw-dev/26/orig
2025-12-04T08:57:06.3197779Z  * [new branch]              gh/yangw-dev/27/base        -> origin/gh/yangw-dev/27/base
2025-12-04T08:57:06.3199401Z  * [new branch]              gh/yangw-dev/27/head        -> origin/gh/yangw-dev/27/head
2025-12-04T08:57:06.3201196Z  * [new branch]              gh/yangw-dev/27/orig        -> origin/gh/yangw-dev/27/orig
2025-12-04T08:57:06.3203724Z  * [new branch]              gh/ydwu4/292/base           -> origin/gh/ydwu4/292/base
2025-12-04T08:57:06.3205255Z  * [new branch]              gh/ydwu4/292/head           -> origin/gh/ydwu4/292/head
2025-12-04T08:57:06.3206806Z  * [new branch]              gh/ydwu4/292/orig           -> origin/gh/ydwu4/292/orig
2025-12-04T08:57:06.3208988Z  * [new branch]              gh/ydwu4/294/base           -> origin/gh/ydwu4/294/base
2025-12-04T08:57:06.3210985Z  * [new branch]              gh/ydwu4/294/head           -> origin/gh/ydwu4/294/head
2025-12-04T08:57:06.3212566Z  * [new branch]              gh/ydwu4/294/orig           -> origin/gh/ydwu4/294/orig
2025-12-04T08:57:06.3214897Z  * [new branch]              gh/ydwu4/295/base           -> origin/gh/ydwu4/295/base
2025-12-04T08:57:06.3216531Z  * [new branch]              gh/ydwu4/295/head           -> origin/gh/ydwu4/295/head
2025-12-04T08:57:06.3219324Z  * [new branch]              gh/ydwu4/295/orig           -> origin/gh/ydwu4/295/orig
2025-12-04T08:57:06.3221345Z  * [new branch]              gh/ydwu4/296/base           -> origin/gh/ydwu4/296/base
2025-12-04T08:57:06.3222866Z  * [new branch]              gh/ydwu4/296/head           -> origin/gh/ydwu4/296/head
2025-12-04T08:57:06.3224455Z  * [new branch]              gh/ydwu4/296/orig           -> origin/gh/ydwu4/296/orig
2025-12-04T08:57:06.3226592Z  * [new branch]              gh/ydwu4/306/base           -> origin/gh/ydwu4/306/base
2025-12-04T08:57:06.3228602Z  * [new branch]              gh/ydwu4/306/head           -> origin/gh/ydwu4/306/head
2025-12-04T08:57:06.3230413Z  * [new branch]              gh/ydwu4/306/orig           -> origin/gh/ydwu4/306/orig
2025-12-04T08:57:06.3232512Z  * [new branch]              gh/ydwu4/312/base           -> origin/gh/ydwu4/312/base
2025-12-04T08:57:06.3234084Z  * [new branch]              gh/ydwu4/312/head           -> origin/gh/ydwu4/312/head
2025-12-04T08:57:06.3235663Z  * [new branch]              gh/ydwu4/312/orig           -> origin/gh/ydwu4/312/orig
2025-12-04T08:57:06.3237770Z  * [new branch]              gh/ydwu4/322/base           -> origin/gh/ydwu4/322/base
2025-12-04T08:57:06.3239340Z  * [new branch]              gh/ydwu4/322/head           -> origin/gh/ydwu4/322/head
2025-12-04T08:57:06.3241373Z  * [new branch]              gh/ydwu4/322/orig           -> origin/gh/ydwu4/322/orig
2025-12-04T08:57:06.3243582Z  * [new branch]              gh/ydwu4/327/base           -> origin/gh/ydwu4/327/base
2025-12-04T08:57:06.3245167Z  * [new branch]              gh/ydwu4/327/head           -> origin/gh/ydwu4/327/head
2025-12-04T08:57:06.3246752Z  * [new branch]              gh/ydwu4/327/orig           -> origin/gh/ydwu4/327/orig
2025-12-04T08:57:06.3248846Z  * [new branch]              gh/ydwu4/328/base           -> origin/gh/ydwu4/328/base
2025-12-04T08:57:06.3250320Z  * [new branch]              gh/ydwu4/328/head           -> origin/gh/ydwu4/328/head
2025-12-04T08:57:06.3251812Z  * [new branch]              gh/ydwu4/328/orig           -> origin/gh/ydwu4/328/orig
2025-12-04T08:57:06.3253771Z  * [new branch]              gh/ydwu4/329/base           -> origin/gh/ydwu4/329/base
2025-12-04T08:57:06.3255313Z  * [new branch]              gh/ydwu4/329/head           -> origin/gh/ydwu4/329/head
2025-12-04T08:57:06.3256915Z  * [new branch]              gh/ydwu4/329/orig           -> origin/gh/ydwu4/329/orig
2025-12-04T08:57:06.3259151Z  * [new branch]              gh/ydwu4/330/base           -> origin/gh/ydwu4/330/base
2025-12-04T08:57:06.3260676Z  * [new branch]              gh/ydwu4/330/head           -> origin/gh/ydwu4/330/head
2025-12-04T08:57:06.3262297Z  * [new branch]              gh/ydwu4/330/orig           -> origin/gh/ydwu4/330/orig
2025-12-04T08:57:06.3264387Z  * [new branch]              gh/ydwu4/331/base           -> origin/gh/ydwu4/331/base
2025-12-04T08:57:06.3265942Z  * [new branch]              gh/ydwu4/331/head           -> origin/gh/ydwu4/331/head
2025-12-04T08:57:06.3267724Z  * [new branch]              gh/ydwu4/331/orig           -> origin/gh/ydwu4/331/orig
2025-12-04T08:57:06.3269596Z  * [new branch]              gh/ydwu4/332/base           -> origin/gh/ydwu4/332/base
2025-12-04T08:57:06.3271135Z  * [new branch]              gh/ydwu4/332/head           -> origin/gh/ydwu4/332/head
2025-12-04T08:57:06.3272712Z  * [new branch]              gh/ydwu4/332/orig           -> origin/gh/ydwu4/332/orig
2025-12-04T08:57:06.3274669Z  * [new branch]              gh/ydwu4/333/base           -> origin/gh/ydwu4/333/base
2025-12-04T08:57:06.3276285Z  * [new branch]              gh/ydwu4/333/head           -> origin/gh/ydwu4/333/head
2025-12-04T08:57:06.3277892Z  * [new branch]              gh/ydwu4/333/orig           -> origin/gh/ydwu4/333/orig
2025-12-04T08:57:06.3279845Z  * [new branch]              gh/ydwu4/334/base           -> origin/gh/ydwu4/334/base
2025-12-04T08:57:06.3281588Z  * [new branch]              gh/ydwu4/334/head           -> origin/gh/ydwu4/334/head
2025-12-04T08:57:06.3283283Z  * [new branch]              gh/ydwu4/334/orig           -> origin/gh/ydwu4/334/orig
2025-12-04T08:57:06.3285727Z  * [new branch]              gh/ydwu4/335/base           -> origin/gh/ydwu4/335/base
2025-12-04T08:57:06.3287315Z  * [new branch]              gh/ydwu4/335/head           -> origin/gh/ydwu4/335/head
2025-12-04T08:57:06.3288907Z  * [new branch]              gh/ydwu4/335/orig           -> origin/gh/ydwu4/335/orig
2025-12-04T08:57:06.3291536Z  * [new branch]              gh/ydwu4/337/base           -> origin/gh/ydwu4/337/base
2025-12-04T08:57:06.3293621Z  * [new branch]              gh/ydwu4/337/head           -> origin/gh/ydwu4/337/head
2025-12-04T08:57:06.3295203Z  * [new branch]              gh/ydwu4/337/orig           -> origin/gh/ydwu4/337/orig
2025-12-04T08:57:06.3297358Z  * [new branch]              gh/ydwu4/339/base           -> origin/gh/ydwu4/339/base
2025-12-04T08:57:06.3298960Z  * [new branch]              gh/ydwu4/339/head           -> origin/gh/ydwu4/339/head
2025-12-04T08:57:06.3300508Z  * [new branch]              gh/ydwu4/339/orig           -> origin/gh/ydwu4/339/orig
2025-12-04T08:57:06.3303118Z  * [new branch]              gh/yf225/133/base           -> origin/gh/yf225/133/base
2025-12-04T08:57:06.3304765Z  * [new branch]              gh/yf225/133/head           -> origin/gh/yf225/133/head
2025-12-04T08:57:06.3306894Z  * [new branch]              gh/yf225/93/base            -> origin/gh/yf225/93/base
2025-12-04T08:57:06.3308522Z  * [new branch]              gh/yf225/93/head            -> origin/gh/yf225/93/head
2025-12-04T08:57:06.3311554Z  * [new branch]              gh/yifuwang/152/base        -> origin/gh/yifuwang/152/base
2025-12-04T08:57:06.3313475Z  * [new branch]              gh/yifuwang/152/head        -> origin/gh/yifuwang/152/head
2025-12-04T08:57:06.3315125Z  * [new branch]              gh/yifuwang/152/orig        -> origin/gh/yifuwang/152/orig
2025-12-04T08:57:06.3317879Z  * [new branch]              gh/yifuwang/195/base        -> origin/gh/yifuwang/195/base
2025-12-04T08:57:06.3319558Z  * [new branch]              gh/yifuwang/195/head        -> origin/gh/yifuwang/195/head
2025-12-04T08:57:06.3321543Z  * [new branch]              gh/yifuwang/195/orig        -> origin/gh/yifuwang/195/orig
2025-12-04T08:57:06.3324231Z  * [new branch]              gh/yiming0416/1/base        -> origin/gh/yiming0416/1/base
2025-12-04T08:57:06.3325946Z  * [new branch]              gh/yiming0416/1/head        -> origin/gh/yiming0416/1/head
2025-12-04T08:57:06.3327867Z  * [new branch]              gh/yiming0416/2/base        -> origin/gh/yiming0416/2/base
2025-12-04T08:57:06.3329693Z  * [new branch]              gh/yiming0416/2/head        -> origin/gh/yiming0416/2/head
2025-12-04T08:57:06.3332288Z  * [new branch]              gh/yushangdi/1/base         -> origin/gh/yushangdi/1/base
2025-12-04T08:57:06.3333863Z  * [new branch]              gh/yushangdi/1/head         -> origin/gh/yushangdi/1/head
2025-12-04T08:57:06.3336108Z  * [new branch]              gh/yushangdi/10/base        -> origin/gh/yushangdi/10/base
2025-12-04T08:57:06.3337725Z  * [new branch]              gh/yushangdi/10/head        -> origin/gh/yushangdi/10/head
2025-12-04T08:57:06.3339277Z  * [new branch]              gh/yushangdi/10/orig        -> origin/gh/yushangdi/10/orig
2025-12-04T08:57:06.3341362Z  * [new branch]              gh/yushangdi/11/base        -> origin/gh/yushangdi/11/base
2025-12-04T08:57:06.3342933Z  * [new branch]              gh/yushangdi/11/head        -> origin/gh/yushangdi/11/head
2025-12-04T08:57:06.3344555Z  * [new branch]              gh/yushangdi/11/orig        -> origin/gh/yushangdi/11/orig
2025-12-04T08:57:06.3346629Z  * [new branch]              gh/yushangdi/2/base         -> origin/gh/yushangdi/2/base
2025-12-04T08:57:06.3348201Z  * [new branch]              gh/yushangdi/2/head         -> origin/gh/yushangdi/2/head
2025-12-04T08:57:06.3350347Z  * [new branch]              gh/yushangdi/7/base         -> origin/gh/yushangdi/7/base
2025-12-04T08:57:06.3351895Z  * [new branch]              gh/yushangdi/7/head         -> origin/gh/yushangdi/7/head
2025-12-04T08:57:06.3353464Z  * [new branch]              gh/yushangdi/7/orig         -> origin/gh/yushangdi/7/orig
2025-12-04T08:57:06.3355848Z  * [new branch]              gh/yushangdi/8/base         -> origin/gh/yushangdi/8/base
2025-12-04T08:57:06.3357563Z  * [new branch]              gh/yushangdi/8/head         -> origin/gh/yushangdi/8/head
2025-12-04T08:57:06.3359271Z  * [new branch]              gh/yushangdi/8/orig         -> origin/gh/yushangdi/8/orig
2025-12-04T08:57:06.3361478Z  * [new branch]              gh/yushangdi/9/base         -> origin/gh/yushangdi/9/base
2025-12-04T08:57:06.3363063Z  * [new branch]              gh/yushangdi/9/head         -> origin/gh/yushangdi/9/head
2025-12-04T08:57:06.3364748Z  * [new branch]              gh/yushangdi/9/orig         -> origin/gh/yushangdi/9/orig
2025-12-04T08:57:06.3367400Z  * [new branch]              gh/zklaus/19/base           -> origin/gh/zklaus/19/base
2025-12-04T08:57:06.3368958Z  * [new branch]              gh/zklaus/19/head           -> origin/gh/zklaus/19/head
2025-12-04T08:57:06.3370898Z  * [new branch]              gh/zklaus/19/orig           -> origin/gh/zklaus/19/orig
2025-12-04T08:57:06.3373011Z  * [new branch]              gh/zklaus/20/base           -> origin/gh/zklaus/20/base
2025-12-04T08:57:06.3374602Z  * [new branch]              gh/zklaus/20/head           -> origin/gh/zklaus/20/head
2025-12-04T08:57:06.3376218Z  * [new branch]              gh/zklaus/20/orig           -> origin/gh/zklaus/20/orig
2025-12-04T08:57:06.3378360Z  * [new branch]              gh/zklaus/21/base           -> origin/gh/zklaus/21/base
2025-12-04T08:57:06.3379954Z  * [new branch]              gh/zklaus/21/head           -> origin/gh/zklaus/21/head
2025-12-04T08:57:06.3381556Z  * [new branch]              gh/zklaus/21/orig           -> origin/gh/zklaus/21/orig
2025-12-04T08:57:06.3383661Z  * [new branch]              gh/zklaus/22/base           -> origin/gh/zklaus/22/base
2025-12-04T08:57:06.3385241Z  * [new branch]              gh/zklaus/22/head           -> origin/gh/zklaus/22/head
2025-12-04T08:57:06.3386797Z  * [new branch]              gh/zklaus/22/orig           -> origin/gh/zklaus/22/orig
2025-12-04T08:57:06.3388967Z  * [new branch]              gh/zklaus/23/base           -> origin/gh/zklaus/23/base
2025-12-04T08:57:06.3390827Z  * [new branch]              gh/zklaus/23/head           -> origin/gh/zklaus/23/head
2025-12-04T08:57:06.3392979Z  * [new branch]              gh/zklaus/23/orig           -> origin/gh/zklaus/23/orig
2025-12-04T08:57:06.3395077Z  * [new branch]              gh/zklaus/24/base           -> origin/gh/zklaus/24/base
2025-12-04T08:57:06.3396670Z  * [new branch]              gh/zklaus/24/head           -> origin/gh/zklaus/24/head
2025-12-04T08:57:06.3398310Z  * [new branch]              gh/zklaus/24/orig           -> origin/gh/zklaus/24/orig
2025-12-04T08:57:06.3401102Z  * [new branch]              gh/zou3519/1197/base        -> origin/gh/zou3519/1197/base
2025-12-04T08:57:06.3402728Z  * [new branch]              gh/zou3519/1197/head        -> origin/gh/zou3519/1197/head
2025-12-04T08:57:06.3404258Z  * [new branch]              gh/zou3519/1197/orig        -> origin/gh/zou3519/1197/orig
2025-12-04T08:57:06.3406611Z  * [new branch]              gh/zou3519/1199/base        -> origin/gh/zou3519/1199/base
2025-12-04T08:57:06.3408678Z  * [new branch]              gh/zou3519/1199/head        -> origin/gh/zou3519/1199/head
2025-12-04T08:57:06.3410341Z  * [new branch]              gh/zou3519/1199/orig        -> origin/gh/zou3519/1199/orig
2025-12-04T08:57:06.3412695Z  * [new branch]              gh/zou3519/1200/base        -> origin/gh/zou3519/1200/base
2025-12-04T08:57:06.3414303Z  * [new branch]              gh/zou3519/1200/head        -> origin/gh/zou3519/1200/head
2025-12-04T08:57:06.3416257Z  * [new branch]              gh/zou3519/1200/orig        -> origin/gh/zou3519/1200/orig
2025-12-04T08:57:06.3418615Z  * [new branch]              gh/zou3519/1201/base        -> origin/gh/zou3519/1201/base
2025-12-04T08:57:06.3420244Z  * [new branch]              gh/zou3519/1201/head        -> origin/gh/zou3519/1201/head
2025-12-04T08:57:06.3421860Z  * [new branch]              gh/zou3519/1201/orig        -> origin/gh/zou3519/1201/orig
2025-12-04T08:57:06.3423813Z  * [new branch]              gh/zou3519/1202/base        -> origin/gh/zou3519/1202/base
2025-12-04T08:57:06.3425450Z  * [new branch]              gh/zou3519/1202/head        -> origin/gh/zou3519/1202/head
2025-12-04T08:57:06.3427020Z  * [new branch]              gh/zou3519/1202/orig        -> origin/gh/zou3519/1202/orig
2025-12-04T08:57:06.3429613Z  * [new branch]              gh/zpcore/1/base            -> origin/gh/zpcore/1/base
2025-12-04T08:57:06.3431224Z  * [new branch]              gh/zpcore/1/head            -> origin/gh/zpcore/1/head
2025-12-04T08:57:06.3433411Z  * [new branch]              gh/zpcore/11/base           -> origin/gh/zpcore/11/base
2025-12-04T08:57:06.3435055Z  * [new branch]              gh/zpcore/11/head           -> origin/gh/zpcore/11/head
2025-12-04T08:57:06.3436678Z  * [new branch]              gh/zpcore/11/orig           -> origin/gh/zpcore/11/orig
2025-12-04T08:57:06.3439152Z  * [new branch]              gh/zpcore/12/base           -> origin/gh/zpcore/12/base
2025-12-04T08:57:06.3440871Z  * [new branch]              gh/zpcore/12/head           -> origin/gh/zpcore/12/head
2025-12-04T08:57:06.3442476Z  * [new branch]              gh/zpcore/12/orig           -> origin/gh/zpcore/12/orig
2025-12-04T08:57:06.3444722Z  * [new branch]              gh/zpcore/13/base           -> origin/gh/zpcore/13/base
2025-12-04T08:57:06.3446273Z  * [new branch]              gh/zpcore/13/head           -> origin/gh/zpcore/13/head
2025-12-04T08:57:06.3447793Z  * [new branch]              gh/zpcore/13/orig           -> origin/gh/zpcore/13/orig
2025-12-04T08:57:06.3449938Z  * [new branch]              gh/zpcore/14/base           -> origin/gh/zpcore/14/base
2025-12-04T08:57:06.3451506Z  * [new branch]              gh/zpcore/14/head           -> origin/gh/zpcore/14/head
2025-12-04T08:57:06.3453129Z  * [new branch]              gh/zpcore/14/orig           -> origin/gh/zpcore/14/orig
2025-12-04T08:57:06.3455432Z  * [new branch]              gh/zpcore/15/base           -> origin/gh/zpcore/15/base
2025-12-04T08:57:06.3456984Z  * [new branch]              gh/zpcore/15/head           -> origin/gh/zpcore/15/head
2025-12-04T08:57:06.3458595Z  * [new branch]              gh/zpcore/15/orig           -> origin/gh/zpcore/15/orig
2025-12-04T08:57:06.3460693Z  * [new branch]              gh/zpcore/2/base            -> origin/gh/zpcore/2/base
2025-12-04T08:57:06.3462281Z  * [new branch]              gh/zpcore/2/head            -> origin/gh/zpcore/2/head
2025-12-04T08:57:06.3464826Z  * [new branch]              gh/zpcore/21/base           -> origin/gh/zpcore/21/base
2025-12-04T08:57:06.3466496Z  * [new branch]              gh/zpcore/21/head           -> origin/gh/zpcore/21/head
2025-12-04T08:57:06.3468097Z  * [new branch]              gh/zpcore/21/orig           -> origin/gh/zpcore/21/orig
2025-12-04T08:57:06.3470555Z  * [new branch]              gh/zpcore/22/base           -> origin/gh/zpcore/22/base
2025-12-04T08:57:06.3472453Z  * [new branch]              gh/zpcore/22/head           -> origin/gh/zpcore/22/head
2025-12-04T08:57:06.3474080Z  * [new branch]              gh/zpcore/22/orig           -> origin/gh/zpcore/22/orig
2025-12-04T08:57:06.3476345Z  * [new branch]              gh/zpcore/23/base           -> origin/gh/zpcore/23/base
2025-12-04T08:57:06.3478353Z  * [new branch]              gh/zpcore/23/head           -> origin/gh/zpcore/23/head
2025-12-04T08:57:06.3480092Z  * [new branch]              gh/zpcore/23/orig           -> origin/gh/zpcore/23/orig
2025-12-04T08:57:06.3482158Z  * [new branch]              gh/zpcore/24/base           -> origin/gh/zpcore/24/base
2025-12-04T08:57:06.3484185Z  * [new branch]              gh/zpcore/24/head           -> origin/gh/zpcore/24/head
2025-12-04T08:57:06.3485769Z  * [new branch]              gh/zpcore/24/orig           -> origin/gh/zpcore/24/orig
2025-12-04T08:57:06.3488141Z  * [new branch]              gh/zpcore/25/base           -> origin/gh/zpcore/25/base
2025-12-04T08:57:06.3489754Z  * [new branch]              gh/zpcore/25/head           -> origin/gh/zpcore/25/head
2025-12-04T08:57:06.3491580Z  * [new branch]              gh/zpcore/25/orig           -> origin/gh/zpcore/25/orig
2025-12-04T08:57:06.3493810Z  * [new branch]              gh/zpcore/26/base           -> origin/gh/zpcore/26/base
2025-12-04T08:57:06.3495488Z  * [new branch]              gh/zpcore/26/head           -> origin/gh/zpcore/26/head
2025-12-04T08:57:06.3496933Z  * [new branch]              gh/zpcore/26/orig           -> origin/gh/zpcore/26/orig
2025-12-04T08:57:06.3499160Z  * [new branch]              gh/zpcore/27/base           -> origin/gh/zpcore/27/base
2025-12-04T08:57:06.3500704Z  * [new branch]              gh/zpcore/27/head           -> origin/gh/zpcore/27/head
2025-12-04T08:57:06.3502329Z  * [new branch]              gh/zpcore/27/orig           -> origin/gh/zpcore/27/orig
2025-12-04T08:57:06.3504895Z  * [new branch]              gh/zpcore/28/base           -> origin/gh/zpcore/28/base
2025-12-04T08:57:06.3506770Z  * [new branch]              gh/zpcore/28/head           -> origin/gh/zpcore/28/head
2025-12-04T08:57:06.3508403Z  * [new branch]              gh/zpcore/28/orig           -> origin/gh/zpcore/28/orig
2025-12-04T08:57:06.3510461Z  * [new branch]              gh/zpcore/3/base            -> origin/gh/zpcore/3/base
2025-12-04T08:57:06.3512059Z  * [new branch]              gh/zpcore/3/head            -> origin/gh/zpcore/3/head
2025-12-04T08:57:06.3513954Z  * [new branch]              gh/zpcore/4/base            -> origin/gh/zpcore/4/base
2025-12-04T08:57:06.3515507Z  * [new branch]              gh/zpcore/4/head            -> origin/gh/zpcore/4/head
2025-12-04T08:57:06.3518181Z  * [new branch]              gh/zpcore/5/base            -> origin/gh/zpcore/5/base
2025-12-04T08:57:06.3519829Z  * [new branch]              gh/zpcore/5/head            -> origin/gh/zpcore/5/head
2025-12-04T08:57:06.3521873Z  * [new branch]              gh/zpcore/6/base            -> origin/gh/zpcore/6/base
2025-12-04T08:57:06.3523356Z  * [new branch]              gh/zpcore/6/head            -> origin/gh/zpcore/6/head
2025-12-04T08:57:06.3525765Z  * [new branch]              gh/zpcore/7/base            -> origin/gh/zpcore/7/base
2025-12-04T08:57:06.3527304Z  * [new branch]              gh/zpcore/7/head            -> origin/gh/zpcore/7/head
2025-12-04T08:57:06.3529367Z  * [new branch]              gh/zpcore/8/base            -> origin/gh/zpcore/8/base
2025-12-04T08:57:06.3531250Z  * [new branch]              gh/zpcore/8/head            -> origin/gh/zpcore/8/head
2025-12-04T08:57:06.3532990Z  * [new branch]              google-main                 -> origin/google-main
2025-12-04T08:57:06.3535071Z  * [new branch]              guangyey/external_stream    -> origin/guangyey/external_stream
2025-12-04T08:57:06.3536551Z  * [new branch]              guangyey/test_2025          -> origin/guangyey/test_2025
2025-12-04T08:57:06.3539185Z  * [new branch]              guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9
2025-12-04T08:57:06.3540988Z  * [new branch]              hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass
2025-12-04T08:57:06.3542583Z  * [new branch]              hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests
2025-12-04T08:57:06.3544048Z  * [new branch]              hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose
2025-12-04T08:57:06.3545622Z  * [new branch]              hc_baseline                 -> origin/hc_baseline
2025-12-04T08:57:06.3547177Z  * [new branch]              hhh_rand                    -> origin/hhh_rand
2025-12-04T08:57:06.3549276Z  * [new branch]              huba/f1                     -> origin/huba/f1
2025-12-04T08:57:06.3551566Z  * [new branch]              increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test
2025-12-04T08:57:06.3553122Z  * [new branch]              inlining                    -> origin/inlining
2025-12-04T08:57:06.3554634Z  * [new branch]              inlining-ezyang             -> origin/inlining-ezyang
2025-12-04T08:57:06.3556333Z  * [new branch]              install-torchao-0.13.0      -> origin/install-torchao-0.13.0
2025-12-04T08:57:06.3558373Z  * [new branch]              instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters
2025-12-04T08:57:06.3559735Z  * [new branch]              invoke-subgraph             -> origin/invoke-subgraph
2025-12-04T08:57:06.3561615Z  * [new branch]              issue#58739                 -> origin/issue#58739
2025-12-04T08:57:06.3563337Z  * [new branch]              jainapurva-patch-1          -> origin/jainapurva-patch-1
2025-12-04T08:57:06.3565478Z  * [new branch]              jathu/o3                    -> origin/jathu/o3
2025-12-04T08:57:06.3566992Z  * [new branch]              jathu/sve                   -> origin/jathu/sve
2025-12-04T08:57:06.3569287Z  * [new branch]              jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2
2025-12-04T08:57:06.3570785Z  * [new branch]              jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2
2025-12-04T08:57:06.3572951Z  * [new branch]              jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter
2025-12-04T08:57:06.3574557Z  * [new branch]              jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning
2025-12-04T08:57:06.3576229Z  * [new branch]              jithunnair-amd-patch-1      -> origin/jithunnair-amd-patch-1
2025-12-04T08:57:06.3577927Z  * [new branch]              jithunnair-amd-patch-10     -> origin/jithunnair-amd-patch-10
2025-12-04T08:57:06.3579659Z  * [new branch]              jithunnair-amd-patch-2      -> origin/jithunnair-amd-patch-2
2025-12-04T08:57:06.3581681Z  * [new branch]              jithunnair-amd-patch-3      -> origin/jithunnair-amd-patch-3
2025-12-04T08:57:06.3583416Z  * [new branch]              jithunnair-amd-patch-4      -> origin/jithunnair-amd-patch-4
2025-12-04T08:57:06.3585060Z  * [new branch]              jithunnair-amd-patch-5      -> origin/jithunnair-amd-patch-5
2025-12-04T08:57:06.3586724Z  * [new branch]              jithunnair-amd-patch-6      -> origin/jithunnair-amd-patch-6
2025-12-04T08:57:06.3588512Z  * [new branch]              jithunnair-amd-patch-7      -> origin/jithunnair-amd-patch-7
2025-12-04T08:57:06.3590092Z  * [new branch]              jithunnair-amd-patch-8      -> origin/jithunnair-amd-patch-8
2025-12-04T08:57:06.3591721Z  * [new branch]              jithunnair-amd-patch-9      -> origin/jithunnair-amd-patch-9
2025-12-04T08:57:06.3593917Z  * [new branch]              justinchu/native-qdq        -> origin/justinchu/native-qdq
2025-12-04T08:57:06.3596110Z  * [new branch]              kainan666/xlf_debug         -> origin/kainan666/xlf_debug
2025-12-04T08:57:06.3597810Z  * [new branch]              kainan_test                 -> origin/kainan_test
2025-12-04T08:57:06.3599406Z  * [new branch]              larryliu0820-patch-1        -> origin/larryliu0820-patch-1
2025-12-04T08:57:06.3601743Z  * [new branch]              leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues
2025-12-04T08:57:06.3603813Z  * [new branch]              lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error
2025-12-04T08:57:06.3605877Z  * [new branch]              liaoxuan/shm_all_reduce     -> origin/liaoxuan/shm_all_reduce
2025-12-04T08:57:06.3607468Z  * [new branch]              liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax
2025-12-04T08:57:06.3608971Z  * [new branch]              liaoxuan/test_int8_sdpa     -> origin/liaoxuan/test_int8_sdpa
2025-12-04T08:57:06.3610480Z  * [new branch]              llama4-stable               -> origin/llama4-stable
2025-12-04T08:57:06.3613136Z  * [new branch]              lts/release/1.8             -> origin/lts/release/1.8
2025-12-04T08:57:06.3615280Z  * [new branch]              lucaskabela/#94773          -> origin/lucaskabela/#94773
2025-12-04T08:57:06.3616846Z  * [new branch]              lucaskabela/fix_164876      -> origin/lucaskabela/fix_164876
2025-12-04T08:57:06.3620060Z  * [new branch]              lucaskabela/flop_counter    -> origin/lucaskabela/flop_counter
2025-12-04T08:57:06.3621541Z  * [new branch]              lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp
2025-12-04T08:57:06.3623340Z  * [new branch]              lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo
2025-12-04T08:57:06.3624955Z  * [new branch]              lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr
2025-12-04T08:57:06.3626943Z  * [new branch]              lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr
2025-12-04T08:57:06.3629011Z  * [new branch]              lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata
2025-12-04T08:57:06.3630508Z  * [new branch]              lucaskabela/rnn_decomp      -> origin/lucaskabela/rnn_decomp
2025-12-04T08:57:06.3632136Z  * [new branch]              lucaskabela/typing_backends -> origin/lucaskabela/typing_backends
2025-12-04T08:57:06.3633893Z  * [new branch]              lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager
2025-12-04T08:57:06.3635517Z  * [new branch]              lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module
2025-12-04T08:57:06.3637131Z  * [new branch]              lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined
2025-12-04T08:57:06.3638735Z  * [new branch]              lucaskabela/typing_variables -> origin/lucaskabela/typing_variables
2025-12-04T08:57:06.3640386Z  * [new branch]              lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts
2025-12-04T08:57:06.3642072Z  * [new branch]              lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions
2025-12-04T08:57:06.3643709Z  * [new branch]              lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists
2025-12-04T08:57:06.3645879Z  * [new branch]              lw/torch_box_by_ref         -> origin/lw/torch_box_by_ref
2025-12-04T08:57:06.3647562Z  * [new branch]              main                        -> origin/main
2025-12-04T08:57:06.3649473Z  * [new branch]              malfet-patch-1              -> origin/malfet-patch-1
2025-12-04T08:57:06.3651471Z  * [new branch]              malfet-patch-2              -> origin/malfet-patch-2
2025-12-04T08:57:06.3653314Z  * [new branch]              malfet-patch-3              -> origin/malfet-patch-3
2025-12-04T08:57:06.3655015Z  * [new branch]              malfet-patch-4              -> origin/malfet-patch-4
2025-12-04T08:57:06.3656769Z  * [new branch]              malfet-patch-5              -> origin/malfet-patch-5
2025-12-04T08:57:06.3658614Z  * [new branch]              malfet-patch-6              -> origin/malfet-patch-6
2025-12-04T08:57:06.3660140Z  * [new branch]              malfet-patch-7              -> origin/malfet-patch-7
2025-12-04T08:57:06.3661706Z  * [new branch]              malfet-patch-8              -> origin/malfet-patch-8
2025-12-04T08:57:06.3663888Z  * [new branch]              malfet/add-3.14-ci          -> origin/malfet/add-3.14-ci
2025-12-04T08:57:06.3665613Z  * [new branch]              malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts
2025-12-04T08:57:06.3667238Z  * [new branch]              malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch
2025-12-04T08:57:06.3668933Z  * [new branch]              malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers
2025-12-04T08:57:06.3670727Z  * [new branch]              malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im
2025-12-04T08:57:06.3673347Z  * [new branch]              manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe
2025-12-04T08:57:06.3674860Z  * [new branch]              manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp
2025-12-04T08:57:06.3676929Z  * [new branch]              masnesral/metaconda         -> origin/masnesral/metaconda
2025-12-04T08:57:06.3678603Z  * [new branch]              mem_profiler_flaky_fix      -> origin/mem_profiler_flaky_fix
2025-12-04T08:57:06.3680379Z  * [new branch]              mem_profiler_stack_trace    -> origin/mem_profiler_stack_trace
2025-12-04T08:57:06.3682109Z  * [new branch]              memory_profiler_stack       -> origin/memory_profiler_stack
2025-12-04T08:57:06.3683829Z  * [new branch]              metascroy-patch-1           -> origin/metascroy-patch-1
2025-12-04T08:57:06.3685481Z  * [new branch]              mingw_posix                 -> origin/mingw_posix
2025-12-04T08:57:06.3687751Z  * [new branch]              mlazos/S429861-debug        -> origin/mlazos/S429861-debug
2025-12-04T08:57:06.3689257Z  * [new branch]              mlazos/aa                   -> origin/mlazos/aa
2025-12-04T08:57:06.3690750Z  * [new branch]              mlazos/acts                 -> origin/mlazos/acts
2025-12-04T08:57:06.3692193Z  * [new branch]              mlazos/arg-renames          -> origin/mlazos/arg-renames
2025-12-04T08:57:06.3693694Z  * [new branch]              mlazos/bad-cudagraphs       -> origin/mlazos/bad-cudagraphs
2025-12-04T08:57:06.3695318Z  * [new branch]              mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks
2025-12-04T08:57:06.3696734Z  * [new branch]              mlazos/beta-tensor          -> origin/mlazos/beta-tensor
2025-12-04T08:57:06.3698221Z  * [new branch]              mlazos/buffers              -> origin/mlazos/buffers
2025-12-04T08:57:06.3699561Z  * [new branch]              mlazos/buffers2             -> origin/mlazos/buffers2
2025-12-04T08:57:06.3701365Z  * [new branch]              mlazos/buffers3             -> origin/mlazos/buffers3
2025-12-04T08:57:06.3703191Z  * [new branch]              mlazos/bwd                  -> origin/mlazos/bwd
2025-12-04T08:57:06.3704721Z  * [new branch]              mlazos/combo-test           -> origin/mlazos/combo-test
2025-12-04T08:57:06.3706347Z  * [new branch]              mlazos/ctx-cleanup          -> origin/mlazos/ctx-cleanup
2025-12-04T08:57:06.3708024Z  * [new branch]              mlazos/cuda-cmd-log         -> origin/mlazos/cuda-cmd-log
2025-12-04T08:57:06.3709820Z  * [new branch]              mlazos/cudagraph-tests      -> origin/mlazos/cudagraph-tests
2025-12-04T08:57:06.3711440Z  * [new branch]              mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement
2025-12-04T08:57:06.3713092Z  * [new branch]              mlazos/cutlass-test         -> origin/mlazos/cutlass-test
2025-12-04T08:57:06.3714676Z  * [new branch]              mlazos/cutlass-topo-bug     -> origin/mlazos/cutlass-topo-bug
2025-12-04T08:57:06.3716389Z  * [new branch]              mlazos/dataclass-proxy      -> origin/mlazos/dataclass-proxy
2025-12-04T08:57:06.3718093Z  * [new branch]              mlazos/dc-attrs             -> origin/mlazos/dc-attrs
2025-12-04T08:57:06.3719703Z  * [new branch]              mlazos/dc-helion            -> origin/mlazos/dc-helion
2025-12-04T08:57:06.3721472Z  * [new branch]              mlazos/dict-fix             -> origin/mlazos/dict-fix
2025-12-04T08:57:06.3723070Z  * [new branch]              mlazos/disable-tf           -> origin/mlazos/disable-tf
2025-12-04T08:57:06.3724638Z  * [new branch]              mlazos/dupe-fix             -> origin/mlazos/dupe-fix
2025-12-04T08:57:06.3726350Z  * [new branch]              mlazos/dyn-batch            -> origin/mlazos/dyn-batch
2025-12-04T08:57:06.3727870Z  * [new branch]              mlazos/evt                  -> origin/mlazos/evt
2025-12-04T08:57:06.3729538Z  * [new branch]              mlazos/extract-examples     -> origin/mlazos/extract-examples
2025-12-04T08:57:06.3731100Z  * [new branch]              mlazos/foreach-op           -> origin/mlazos/foreach-op
2025-12-04T08:57:06.3733145Z  * [new branch]              mlazos/fp8                  -> origin/mlazos/fp8
2025-12-04T08:57:06.3734833Z  * [new branch]              mlazos/fp8-bias             -> origin/mlazos/fp8-bias
2025-12-04T08:57:06.3736500Z  * [new branch]              mlazos/fp8-bias-fusion      -> origin/mlazos/fp8-bias-fusion
2025-12-04T08:57:06.3738084Z  * [new branch]              mlazos/fp8-fixes            -> origin/mlazos/fp8-fixes
2025-12-04T08:57:06.3739872Z  * [new branch]              mlazos/freezing             -> origin/mlazos/freezing
2025-12-04T08:57:06.3741422Z  * [new branch]              mlazos/h-comp               -> origin/mlazos/h-comp
2025-12-04T08:57:06.3743120Z  * [new branch]              mlazos/h-comp2              -> origin/mlazos/h-comp2
2025-12-04T08:57:06.3745309Z  * [new branch]              mlazos/hash-hop             -> origin/mlazos/hash-hop
2025-12-04T08:57:06.3747006Z  * [new branch]              mlazos/hc                   -> origin/mlazos/hc
2025-12-04T08:57:06.3748620Z  * [new branch]              mlazos/hc-cycles            -> origin/mlazos/hc-cycles
2025-12-04T08:57:06.3750233Z  * [new branch]              mlazos/hc-fixes             -> origin/mlazos/hc-fixes
2025-12-04T08:57:06.3751829Z  * [new branch]              mlazos/hc-fixes3            -> origin/mlazos/hc-fixes3
2025-12-04T08:57:06.3753412Z  * [new branch]              mlazos/hc-fixes4            -> origin/mlazos/hc-fixes4
2025-12-04T08:57:06.3755048Z  * [new branch]              mlazos/hc-hf                -> origin/mlazos/hc-hf
2025-12-04T08:57:06.3756663Z  * [new branch]              mlazos/hc-mut               -> origin/mlazos/hc-mut
2025-12-04T08:57:06.3758343Z  * [new branch]              mlazos/hc10                 -> origin/mlazos/hc10
2025-12-04T08:57:06.3760004Z  * [new branch]              mlazos/hc11                 -> origin/mlazos/hc11
2025-12-04T08:57:06.3761851Z  * [new branch]              mlazos/hc12                 -> origin/mlazos/hc12
2025-12-04T08:57:06.3763370Z  * [new branch]              mlazos/hc13                 -> origin/mlazos/hc13
2025-12-04T08:57:06.3764944Z  * [new branch]              mlazos/hc14                 -> origin/mlazos/hc14
2025-12-04T08:57:06.3766525Z  * [new branch]              mlazos/hc15                 -> origin/mlazos/hc15
2025-12-04T08:57:06.3768173Z  * [new branch]              mlazos/hc2                  -> origin/mlazos/hc2
2025-12-04T08:57:06.3769801Z  * [new branch]              mlazos/hc4                  -> origin/mlazos/hc4
2025-12-04T08:57:06.3771365Z  * [new branch]              mlazos/hc5                  -> origin/mlazos/hc5
2025-12-04T08:57:06.3772986Z  * [new branch]              mlazos/hc6                  -> origin/mlazos/hc6
2025-12-04T08:57:06.3774583Z  * [new branch]              mlazos/hc7                  -> origin/mlazos/hc7
2025-12-04T08:57:06.3776101Z  * [new branch]              mlazos/hc8                  -> origin/mlazos/hc8
2025-12-04T08:57:06.3777890Z  * [new branch]              mlazos/hc9                  -> origin/mlazos/hc9
2025-12-04T08:57:06.3779467Z  * [new branch]              mlazos/hc_baseline2         -> origin/mlazos/hc_baseline2
2025-12-04T08:57:06.3780963Z  * [new branch]              mlazos/inductor-streams     -> origin/mlazos/inductor-streams
2025-12-04T08:57:06.3782413Z  * [new branch]              mlazos/main                 -> origin/mlazos/main
2025-12-04T08:57:06.3784011Z  * [new branch]              mlazos/mcg2                 -> origin/mlazos/mcg2
2025-12-04T08:57:06.3785655Z  * [new branch]              mlazos/meta-guards          -> origin/mlazos/meta-guards
2025-12-04T08:57:06.3787903Z  * [new branch]              mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam
2025-12-04T08:57:06.3789512Z  * [new branch]              mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup
2025-12-04T08:57:06.3791086Z  * [new branch]              mlazos/mod-fix              -> origin/mlazos/mod-fix
2025-12-04T08:57:06.3792847Z  * [new branch]              mlazos/mode-fix             -> origin/mlazos/mode-fix
2025-12-04T08:57:06.3794453Z  * [new branch]              mlazos/offsets              -> origin/mlazos/offsets
2025-12-04T08:57:06.3795976Z  * [new branch]              mlazos/overguarding         -> origin/mlazos/overguarding
2025-12-04T08:57:06.3797624Z  * [new branch]              mlazos/proxy-ctors          -> origin/mlazos/proxy-ctors
2025-12-04T08:57:06.3799232Z  * [new branch]              mlazos/quant-fix            -> origin/mlazos/quant-fix
2025-12-04T08:57:06.3800991Z  * [new branch]              mlazos/resnet-fix           -> origin/mlazos/resnet-fix
2025-12-04T08:57:06.3802605Z  * [new branch]              mlazos/rm-buf-names         -> origin/mlazos/rm-buf-names
2025-12-04T08:57:06.3804213Z  * [new branch]              mlazos/rm-code              -> origin/mlazos/rm-code
2025-12-04T08:57:06.3805814Z  * [new branch]              mlazos/rm-spam              -> origin/mlazos/rm-spam
2025-12-04T08:57:06.3807431Z  * [new branch]              mlazos/rtp                  -> origin/mlazos/rtp
2025-12-04T08:57:06.3809109Z  * [new branch]              mlazos/static-idx-dbg       -> origin/mlazos/static-idx-dbg
2025-12-04T08:57:06.3810737Z  * [new branch]              mlazos/static-inputs-log    -> origin/mlazos/static-inputs-log
2025-12-04T08:57:06.3812274Z  * [new branch]              mlazos/stests               -> origin/mlazos/stests
2025-12-04T08:57:06.3813983Z  * [new branch]              mlazos/stream-ops           -> origin/mlazos/stream-ops
2025-12-04T08:57:06.3815579Z  * [new branch]              mlazos/td-fix2              -> origin/mlazos/td-fix2
2025-12-04T08:57:06.3817342Z  * [new branch]              mlazos/tensor-hasattr2      -> origin/mlazos/tensor-hasattr2
2025-12-04T08:57:06.3818979Z  * [new branch]              mlazos/test                 -> origin/mlazos/test
2025-12-04T08:57:06.3820549Z  * [new branch]              mlazos/tf-mode              -> origin/mlazos/tf-mode
2025-12-04T08:57:06.3822185Z  * [new branch]              mlazos/tf-mode-backup2      -> origin/mlazos/tf-mode-backup2
2025-12-04T08:57:06.3823871Z  * [new branch]              mlazos/tf-mode-reland       -> origin/mlazos/tf-mode-reland
2025-12-04T08:57:06.3825536Z  * [new branch]              mlazos/tf-mode-reland2      -> origin/mlazos/tf-mode-reland2
2025-12-04T08:57:06.3827139Z  * [new branch]              mlazos/tf-mode-reland3      -> origin/mlazos/tf-mode-reland3
2025-12-04T08:57:06.3828738Z  * [new branch]              mlazos/triton-no-epi        -> origin/mlazos/triton-no-epi
2025-12-04T08:57:06.3830797Z  * [new branch]              mlazos/tune-proto           -> origin/mlazos/tune-proto
2025-12-04T08:57:06.3832321Z  * [new branch]              mlazos/tuple-fixes          -> origin/mlazos/tuple-fixes
2025-12-04T08:57:06.3833997Z  * [new branch]              mlazos/tuple-fixes2         -> origin/mlazos/tuple-fixes2
2025-12-04T08:57:06.3835850Z  * [new branch]              mlazos/tuple-handling       -> origin/mlazos/tuple-handling
2025-12-04T08:57:06.3837218Z  * [new branch]              mlazos/user-stream-base     -> origin/mlazos/user-stream-base
2025-12-04T08:57:06.3838950Z  * [new branch]              mlazos/user-streams         -> origin/mlazos/user-streams
2025-12-04T08:57:06.3840518Z  * [new branch]              mlazos/user-streams-backup  -> origin/mlazos/user-streams-backup
2025-12-04T08:57:06.3842668Z  * [new branch]              mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2
2025-12-04T08:57:06.3844328Z  * [new branch]              mlazos/vary-beta            -> origin/mlazos/vary-beta
2025-12-04T08:57:06.3845875Z  * [new branch]              mlazos/vary-beta2           -> origin/mlazos/vary-beta2
2025-12-04T08:57:06.3847511Z  * [new branch]              mlazos/weird-perf1          -> origin/mlazos/weird-perf1
2025-12-04T08:57:06.3849202Z  * [new branch]              mm_out_dtype_compile        -> origin/mm_out_dtype_compile
2025-12-04T08:57:06.3850937Z  * [new branch]              module-shim                 -> origin/module-shim
2025-12-04T08:57:06.3852507Z  * [new branch]              move_config                 -> origin/move_config
2025-12-04T08:57:06.3854640Z  * [new branch]              msaroufim/reduce            -> origin/msaroufim/reduce
2025-12-04T08:57:06.3856744Z  * [new branch]              mtia/basic-cmake            -> origin/mtia/basic-cmake
2025-12-04T08:57:06.3858987Z  * [new branch]              mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape
2025-12-04T08:57:06.3860529Z  * [new branch]              my_varlen_backup            -> origin/my_varlen_backup
2025-12-04T08:57:06.3862274Z  * [new branch]              nativert_num_outputs        -> origin/nativert_num_outputs
2025-12-04T08:57:06.3863956Z  * [new branch]              new-codegen                 -> origin/new-codegen
2025-12-04T08:57:06.3865645Z  * [new branch]              newtest-base                -> origin/newtest-base
2025-12-04T08:57:06.3867804Z  * [new branch]              ngimel/addmm_dtype          -> origin/ngimel/addmm_dtype
2025-12-04T08:57:06.3869306Z  * [new branch]              ngimel/div_inv              -> origin/ngimel/div_inv
2025-12-04T08:57:06.3870844Z  * [new branch]              ngimel/error_index_list     -> origin/ngimel/error_index_list
2025-12-04T08:57:06.3872268Z  * [new branch]              ngimel/gather_grid          -> origin/ngimel/gather_grid
2025-12-04T08:57:06.3873908Z  * [new branch]              ngimel/gather_grid_release  -> origin/ngimel/gather_grid_release
2025-12-04T08:57:06.3875327Z  * [new branch]              ngimel/gg_new               -> origin/ngimel/gg_new
2025-12-04T08:57:06.3876827Z  * [new branch]              ngimel/hostalloc            -> origin/ngimel/hostalloc
2025-12-04T08:57:06.3878334Z  * [new branch]              ngimel/storage_id           -> origin/ngimel/storage_id
2025-12-04T08:57:06.3879942Z  * [new branch]              nightly                     -> origin/nightly
2025-12-04T08:57:06.3882301Z  * [new branch]              nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check
2025-12-04T08:57:06.3883828Z  * [new branch]              nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias
2025-12-04T08:57:06.3885490Z  * [new branch]              nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor
2025-12-04T08:57:06.3887272Z  * [new branch]              nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch
2025-12-04T08:57:06.3889132Z  * [new branch]              nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions
2025-12-04T08:57:06.3891390Z  * [new branch]              nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index
2025-12-04T08:57:06.3893031Z  * [new branch]              nikitaved/test              -> origin/nikitaved/test
2025-12-04T08:57:06.3895180Z  * [new branch]              nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune
2025-12-04T08:57:06.3897024Z  * [new branch]              no_distributed_log_spew     -> origin/no_distributed_log_spew
2025-12-04T08:57:06.3898775Z  * [new branch]              nofun-hack                  -> origin/nofun-hack
2025-12-04T08:57:06.3900167Z  * [new branch]              norm_bench                  -> origin/norm_bench
2025-12-04T08:57:06.3902281Z  * [new branch]              nullplay/fuse_matmul        -> origin/nullplay/fuse_matmul
2025-12-04T08:57:06.3903962Z  * [new branch]              nullplay_fuse_matmul        -> origin/nullplay_fuse_matmul
2025-12-04T08:57:06.3905652Z  * [new branch]              optimizer_test              -> origin/optimizer_test
2025-12-04T08:57:06.3908287Z  * [new branch]              orig/release/1.10           -> origin/orig/release/1.10
2025-12-04T08:57:06.3909994Z  * [new branch]              orig/release/1.11           -> origin/orig/release/1.11
2025-12-04T08:57:06.3911635Z  * [new branch]              orig/release/1.12           -> origin/orig/release/1.12
2025-12-04T08:57:06.3913555Z  * [new branch]              orig/release/1.13           -> origin/orig/release/1.13
2025-12-04T08:57:06.3915168Z  * [new branch]              orig/release/1.6            -> origin/orig/release/1.6
2025-12-04T08:57:06.3916864Z  * [new branch]              orig/release/1.7            -> origin/orig/release/1.7
2025-12-04T08:57:06.3918828Z  * [new branch]              orig/release/1.8            -> origin/orig/release/1.8
2025-12-04T08:57:06.3920481Z  * [new branch]              orig/release/1.9            -> origin/orig/release/1.9
2025-12-04T08:57:06.3922097Z  * [new branch]              orig/release/2.0            -> origin/orig/release/2.0
2025-12-04T08:57:06.3923701Z  * [new branch]              orig/release/2.1            -> origin/orig/release/2.1
2025-12-04T08:57:06.3925304Z  * [new branch]              orig/release/2.2            -> origin/orig/release/2.2
2025-12-04T08:57:06.3926945Z  * [new branch]              orig/release/2.3            -> origin/orig/release/2.3
2025-12-04T08:57:06.3928444Z  * [new branch]              orig/release/2.4            -> origin/orig/release/2.4
2025-12-04T08:57:06.3929975Z  * [new branch]              orig/release/2.5            -> origin/orig/release/2.5
2025-12-04T08:57:06.3931574Z  * [new branch]              orig/release/2.6            -> origin/orig/release/2.6
2025-12-04T08:57:06.3933592Z  * [new branch]              orig/release/2.7            -> origin/orig/release/2.7
2025-12-04T08:57:06.3935781Z  * [new branch]              orig/release/2.8            -> origin/orig/release/2.8
2025-12-04T08:57:06.3937404Z  * [new branch]              orig/release/2.9            -> origin/orig/release/2.9
2025-12-04T08:57:06.3940874Z  * [new branch]              origin/gh/fxdawnn/1/base    -> origin/origin/gh/fxdawnn/1/base
2025-12-04T08:57:06.3942391Z  * [new branch]              origin/gh/fxdawnn/1/orig    -> origin/origin/gh/fxdawnn/1/orig
2025-12-04T08:57:06.3944987Z  * [new branch]              origin/gh/zpcore/14/orig    -> origin/origin/gh/zpcore/14/orig
2025-12-04T08:57:06.3946790Z  * [new branch]              oulgen-patch-1              -> origin/oulgen-patch-1
2025-12-04T08:57:06.3948513Z  * [new branch]              oulgen-patch-2              -> origin/oulgen-patch-2
2025-12-04T08:57:06.3950230Z  * [new branch]              oulgen-patch-3              -> origin/oulgen-patch-3
2025-12-04T08:57:06.3952386Z  * [new branch]              oulgen-patch-4              -> origin/oulgen-patch-4
2025-12-04T08:57:06.3954064Z  * [new branch]              padded-tensor               -> origin/padded-tensor
2025-12-04T08:57:06.3955811Z  * [new branch]              pca2                        -> origin/pca2
2025-12-04T08:57:06.3957669Z  * [new branch]              per_channel_backup          -> origin/per_channel_backup
2025-12-04T08:57:06.3959336Z  * [new branch]              perf_ops                    -> origin/perf_ops
2025-12-04T08:57:06.3961454Z  * [new branch]              perf_ops_2_9                -> origin/perf_ops_2_9
2025-12-04T08:57:06.3963487Z  * [new branch]              pianpwk-patch-1             -> origin/pianpwk-patch-1
2025-12-04T08:57:06.3965462Z  * [new branch]              pianpwk/__draft_debug_mode  -> origin/pianpwk/__draft_debug_mode
2025-12-04T08:57:06.3966799Z  * [new branch]              pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft
2025-12-04T08:57:06.3968360Z  * [new branch]              pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile
2025-12-04T08:57:06.3969872Z  * [new branch]              pianpwk/_draft_triton_11_3  -> origin/pianpwk/_draft_triton_11_3
2025-12-04T08:57:06.3971329Z  * [new branch]              pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft
2025-12-04T08:57:06.3973131Z  * [new branch]              pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys
2025-12-04T08:57:06.3975061Z  * [new branch]              pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode
2025-12-04T08:57:06.3977304Z  * [new branch]              pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size
2025-12-04T08:57:06.3978833Z  * [new branch]              pianpwk/anomaly_tb          -> origin/pianpwk/anomaly_tb
2025-12-04T08:57:06.3980462Z  * [new branch]              pianpwk/auto_fx_annotate    -> origin/pianpwk/auto_fx_annotate
2025-12-04T08:57:06.3982128Z  * [new branch]              pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export
2025-12-04T08:57:06.3983782Z  * [new branch]              pianpwk/bert_dynamic_perf   -> origin/pianpwk/bert_dynamic_perf
2025-12-04T08:57:06.3985491Z  * [new branch]              pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces
2025-12-04T08:57:06.3987114Z  * [new branch]              pianpwk/debug_hash_tensor   -> origin/pianpwk/debug_hash_tensor
2025-12-04T08:57:06.3988761Z  * [new branch]              pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate
2025-12-04T08:57:06.3990387Z  * [new branch]              pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults
2025-12-04T08:57:06.3991966Z  * [new branch]              pianpwk/debug_mode_hacks    -> origin/pianpwk/debug_mode_hacks
2025-12-04T08:57:06.3993613Z  * [new branch]              pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor
2025-12-04T08:57:06.3995215Z  * [new branch]              pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids
2025-12-04T08:57:06.3996897Z  * [new branch]              pianpwk/debug_mode_triton   -> origin/pianpwk/debug_mode_triton
2025-12-04T08:57:06.3998577Z  * [new branch]              pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace
2025-12-04T08:57:06.4000332Z  * [new branch]              pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective
2025-12-04T08:57:06.4001961Z  * [new branch]              pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf
2025-12-04T08:57:06.4004208Z  * [new branch]              pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug
2025-12-04T08:57:06.4005783Z  * [new branch]              pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile
2025-12-04T08:57:06.4007363Z  * [new branch]              pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn
2025-12-04T08:57:06.4009076Z  * [new branch]              pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5
2025-12-04T08:57:06.4010805Z  * [new branch]              pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk
2025-12-04T08:57:06.4012540Z  * [new branch]              pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath
2025-12-04T08:57:06.4014197Z  * [new branch]              pianpwk/event_list_tree     -> origin/pianpwk/event_list_tree
2025-12-04T08:57:06.4015987Z  * [new branch]              pianpwk/false_numel_refs    -> origin/pianpwk/false_numel_refs
2025-12-04T08:57:06.4017556Z  * [new branch]              pianpwk/maybe_guard_rel     -> origin/pianpwk/maybe_guard_rel
2025-12-04T08:57:06.4020661Z  * [new branch]              pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft
2025-12-04T08:57:06.4022311Z  * [new branch]              pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat
2025-12-04T08:57:06.4023975Z  * [new branch]              pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better
2025-12-04T08:57:06.4025586Z  * [new branch]              pianpwk/pre_forward_hook    -> origin/pianpwk/pre_forward_hook
2025-12-04T08:57:06.4027216Z  * [new branch]              pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate
2025-12-04T08:57:06.4028840Z  * [new branch]              pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards
2025-12-04T08:57:06.4030516Z  * [new branch]              pianpwk/sym_tokens_draft    -> origin/pianpwk/sym_tokens_draft
2025-12-04T08:57:06.4032088Z  * [new branch]              pianpwk/symint_one_hot      -> origin/pianpwk/symint_one_hot
2025-12-04T08:57:06.4033814Z  * [new branch]              pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false
2025-12-04T08:57:06.4035397Z  * [new branch]              pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap
2025-12-04T08:57:06.4037130Z  * [new branch]              pianpwk/try_dumb_stuff      -> origin/pianpwk/try_dumb_stuff
2025-12-04T08:57:06.4039802Z  * [new branch]              pianpwk/try_dumb_stuff_2    -> origin/pianpwk/try_dumb_stuff_2
2025-12-04T08:57:06.4040675Z  * [new branch]              pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm
2025-12-04T08:57:06.4042202Z  * [new branch]              pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2
2025-12-04T08:57:06.4043822Z  * [new branch]              pianpwk/user_symints        -> origin/pianpwk/user_symints
2025-12-04T08:57:06.4045472Z  * [new branch]              pianpwk/wan21_reshape       -> origin/pianpwk/wan21_reshape
2025-12-04T08:57:06.4048041Z  * [new branch]              piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112
2025-12-04T08:57:06.4049548Z  * [new branch]              piz/prop_cache_clean        -> origin/piz/prop_cache_clean
2025-12-04T08:57:06.4051206Z  * [new branch]              pool-separate               -> origin/pool-separate
2025-12-04T08:57:06.4052809Z  * [new branch]              pr-156087                   -> origin/pr-156087
2025-12-04T08:57:06.4054976Z  * [new branch]              pr/131860                   -> origin/pr/131860
2025-12-04T08:57:06.4056656Z  * [new branch]              predispatch_to              -> origin/predispatch_to
2025-12-04T08:57:06.4058349Z  * [new branch]              protect-c17                 -> origin/protect-c17
2025-12-04T08:57:06.4059912Z  * [new branch]              pt-opt-cuda3                -> origin/pt-opt-cuda3
2025-12-04T08:57:06.4062085Z  * [new branch]              python_compiled_autograd    -> origin/python_compiled_autograd
2025-12-04T08:57:06.4064394Z  * [new branch]              q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown
2025-12-04T08:57:06.4066066Z  * [new branch]              q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args
2025-12-04T08:57:06.4068431Z  * [new branch]              qchip/export-D54134695      -> origin/qchip/export-D54134695
2025-12-04T08:57:06.4070201Z  * [new branch]              quote-pytest_cache          -> origin/quote-pytest_cache
2025-12-04T08:57:06.4072499Z  * [new branch]              reland-accgrad-stream-warn  -> origin/reland-accgrad-stream-warn
2025-12-04T08:57:06.4074678Z  * [new branch]              release/1.10                -> origin/release/1.10
2025-12-04T08:57:06.4076551Z  * [new branch]              release/1.11                -> origin/release/1.11
2025-12-04T08:57:06.4077828Z  * [new branch]              release/1.12                -> origin/release/1.12
2025-12-04T08:57:06.4079463Z  * [new branch]              release/1.13                -> origin/release/1.13
2025-12-04T08:57:06.4080995Z  * [new branch]              release/1.4                 -> origin/release/1.4
2025-12-04T08:57:06.4082367Z  * [new branch]              release/1.4.1               -> origin/release/1.4.1
2025-12-04T08:57:06.4083889Z  * [new branch]              release/1.5                 -> origin/release/1.5
2025-12-04T08:57:06.4085496Z  * [new branch]              release/1.6                 -> origin/release/1.6
2025-12-04T08:57:06.4087191Z  * [new branch]              release/1.7                 -> origin/release/1.7
2025-12-04T08:57:06.4088864Z  * [new branch]              release/1.8                 -> origin/release/1.8
2025-12-04T08:57:06.4090386Z  * [new branch]              release/1.9                 -> origin/release/1.9
2025-12-04T08:57:06.4091971Z  * [new branch]              release/2.0                 -> origin/release/2.0
2025-12-04T08:57:06.4093689Z  * [new branch]              release/2.1                 -> origin/release/2.1
2025-12-04T08:57:06.4095507Z  * [new branch]              release/2.2                 -> origin/release/2.2
2025-12-04T08:57:06.4097409Z  * [new branch]              release/2.3                 -> origin/release/2.3
2025-12-04T08:57:06.4099483Z  * [new branch]              release/2.4                 -> origin/release/2.4
2025-12-04T08:57:06.4101592Z  * [new branch]              release/2.5                 -> origin/release/2.5
2025-12-04T08:57:06.4103282Z  * [new branch]              release/2.6                 -> origin/release/2.6
2025-12-04T08:57:06.4105083Z  * [new branch]              release/2.7                 -> origin/release/2.7
2025-12-04T08:57:06.4106731Z  * [new branch]              release/2.8                 -> origin/release/2.8
2025-12-04T08:57:06.4108456Z  * [new branch]              release/2.9                 -> origin/release/2.9
2025-12-04T08:57:06.4110153Z  * [new branch]              release_notes               -> origin/release_notes
2025-12-04T08:57:06.4111871Z  * [new branch]              remove_pyinterpreter        -> origin/remove_pyinterpreter
2025-12-04T08:57:06.4113759Z  * [new branch]              replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836
2025-12-04T08:57:06.4115353Z  * [new branch]              replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248
2025-12-04T08:57:06.4116877Z  * [new branch]              replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324
2025-12-04T08:57:06.4118789Z  * [new branch]              replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020
2025-12-04T08:57:06.4122053Z  * [new branch]              revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head
2025-12-04T08:57:06.4124978Z  * [new branch]              revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head
2025-12-04T08:57:06.4128023Z  * [new branch]              revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head
2025-12-04T08:57:06.4131210Z  * [new branch]              revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head
2025-12-04T08:57:06.4133115Z  * [new branch]              revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_
2025-12-04T08:57:06.4134639Z  * [new branch]              revert-hoo-invoke-subgraph  -> origin/revert-hoo-invoke-subgraph
2025-12-04T08:57:06.4136368Z  * [new branch]              revert_always_build_distributed -> origin/revert_always_build_distributed
2025-12-04T08:57:06.4137967Z  * [new branch]              rms_norm_patch              -> origin/rms_norm_patch
2025-12-04T08:57:06.4140098Z  * [new branch]              ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation
2025-12-04T08:57:06.4141741Z  * [new branch]              ruisi/fix_comm_estimation   -> origin/ruisi/fix_comm_estimation
2025-12-04T08:57:06.4143148Z  * [new branch]              ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation
2025-12-04T08:57:06.4144599Z  * [new branch]              ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing
2025-12-04T08:57:06.4146448Z  * [new branch]              ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass
2025-12-04T08:57:06.4148464Z  * [new branch]              ruisi/manual_bucket_pass    -> origin/ruisi/manual_bucket_pass
2025-12-04T08:57:06.4150841Z  * [new branch]              ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures
2025-12-04T08:57:06.4152332Z  * [new branch]              ryanguo99/fix-closure-var   -> origin/ryanguo99/fix-closure-var
2025-12-04T08:57:06.4154524Z  * [new branch]              rzou/faketensor_bench       -> origin/rzou/faketensor_bench
2025-12-04T08:57:06.4155961Z  * [new branch]              rzou/njt                    -> origin/rzou/njt
2025-12-04T08:57:06.4157505Z  * [new branch]              rzou/pca                    -> origin/rzou/pca
2025-12-04T08:57:06.4159091Z  * [new branch]              rzou/realprop               -> origin/rzou/realprop
2025-12-04T08:57:06.4160872Z  * [new branch]              samplevllm                  -> origin/samplevllm
2025-12-04T08:57:06.4163327Z  * [new branch]              sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm
2025-12-04T08:57:06.4164973Z  * [new branch]              sapling-pr-archive-SS-JIA   -> origin/sapling-pr-archive-SS-JIA
2025-12-04T08:57:06.4166776Z  * [new branch]              sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain
2025-12-04T08:57:06.4168323Z  * [new branch]              save                        -> origin/save
2025-12-04T08:57:06.4169947Z  * [new branch]              scaled_mm                   -> origin/scaled_mm
2025-12-04T08:57:06.4171597Z  * [new branch]              scan_attempt                -> origin/scan_attempt
2025-12-04T08:57:06.4174089Z  * [new branch]              sdym/2.5.1                  -> origin/sdym/2.5.1
2025-12-04T08:57:06.4175901Z  * [new branch]              sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix
2025-12-04T08:57:06.4177987Z  * [new branch]              shengf/fx-xform-perf        -> origin/shengf/fx-xform-perf
2025-12-04T08:57:06.4179737Z  * [new branch]              shoumikhin-patch-1          -> origin/shoumikhin-patch-1
2025-12-04T08:57:06.4181677Z  * [new branch]              solve-accuracy-fix          -> origin/solve-accuracy-fix
2025-12-04T08:57:06.4183837Z  * [new branch]              some_rocm_inductor_skips    -> origin/some_rocm_inductor_skips
2025-12-04T08:57:06.4185914Z  * [new branch]              soulitzer/stash-tls-ac      -> origin/soulitzer/stash-tls-ac
2025-12-04T08:57:06.4187721Z  * [new branch]              sparse-mm-bf16-support      -> origin/sparse-mm-bf16-support
2025-12-04T08:57:06.4189346Z  * [new branch]              starterTaskUpdate           -> origin/starterTaskUpdate
2025-12-04T08:57:06.4190830Z  * [new branch]              suo                         -> origin/suo
2025-12-04T08:57:06.4192448Z  * [new branch]              sve-poc                     -> origin/sve-poc
2025-12-04T08:57:06.4194131Z  * [new branch]              switch-bn                   -> origin/switch-bn
2025-12-04T08:57:06.4195854Z  * [new branch]              sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop
2025-12-04T08:57:06.4197481Z  * [new branch]              sy_aot_eager_record         -> origin/sy_aot_eager_record
2025-12-04T08:57:06.4199095Z  * [new branch]              sy_custom_bucketing         -> origin/sy_custom_bucketing
2025-12-04T08:57:06.4201077Z  * [new branch]              sy_debug_mode_test          -> origin/sy_debug_mode_test
2025-12-04T08:57:06.4202987Z  * [new branch]              sy_deserialize              -> origin/sy_deserialize
2025-12-04T08:57:06.4204546Z  * [new branch]              sy_dump_gm_code             -> origin/sy_dump_gm_code
2025-12-04T08:57:06.4206143Z  * [new branch]              sy_exp                      -> origin/sy_exp
2025-12-04T08:57:06.4207883Z  * [new branch]              sy_export_annotation        -> origin/sy_export_annotation
2025-12-04T08:57:06.4209567Z  * [new branch]              sy_invoke_subgraph          -> origin/sy_invoke_subgraph
2025-12-04T08:57:06.4211490Z  * [new branch]              sy_kernel_bw_name           -> origin/sy_kernel_bw_name
2025-12-04T08:57:06.4213206Z  * [new branch]              sy_multi_arch               -> origin/sy_multi_arch
2025-12-04T08:57:06.4214911Z  * [new branch]              sy_nn_module_stack          -> origin/sy_nn_module_stack
2025-12-04T08:57:06.4216570Z  * [new branch]              sy_original_dtensor         -> origin/sy_original_dtensor
2025-12-04T08:57:06.4218422Z  * [new branch]              sy_profiler_cia             -> origin/sy_profiler_cia
2025-12-04T08:57:06.4219900Z  * [new branch]              symm_mem_sync               -> origin/symm_mem_sync
2025-12-04T08:57:06.4221512Z  * [new branch]              sympy-bottleneck-repro      -> origin/sympy-bottleneck-repro
2025-12-04T08:57:06.4223258Z  * [new branch]              tensordict_integration      -> origin/tensordict_integration
2025-12-04T08:57:06.4224933Z  * [new branch]              test-move-conda-builds      -> origin/test-move-conda-builds
2025-12-04T08:57:06.4226576Z  * [new branch]              test-old                    -> origin/test-old
2025-12-04T08:57:06.4228861Z  * [new branch]              test/bmm_heur               -> origin/test/bmm_heur
2025-12-04T08:57:06.4231117Z  * [new branch]              tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix
2025-12-04T08:57:06.4232672Z  * [new branch]              tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune
2025-12-04T08:57:06.4234136Z  * [new branch]              tianren/customOp_fusion     -> origin/tianren/customOp_fusion
2025-12-04T08:57:06.4236110Z  * [new branch]              tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark
2025-12-04T08:57:06.4237985Z  * [new branch]              tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix
2025-12-04T08:57:06.4239860Z  * [new branch]              tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config
2025-12-04T08:57:06.4241676Z  * [new branch]              tianren/dynamic_range_input -> origin/tianren/dynamic_range_input
2025-12-04T08:57:06.4243353Z  * [new branch]              tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix
2025-12-04T08:57:06.4244855Z  * [new branch]              tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge
2025-12-04T08:57:06.4246407Z  * [new branch]              tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp
2025-12-04T08:57:06.4247986Z  * [new branch]              tianren/fx_codegen_dump     -> origin/tianren/fx_codegen_dump
2025-12-04T08:57:06.4249596Z  * [new branch]              tianren/symmetric_memory    -> origin/tianren/symmetric_memory
2025-12-04T08:57:06.4251176Z  * [new branch]              tianren/test                -> origin/tianren/test
2025-12-04T08:57:06.4252951Z  * [new branch]              tidy_performance_cyy        -> origin/tidy_performance_cyy
2025-12-04T08:57:06.4254585Z  * [new branch]              tmp                         -> origin/tmp
2025-12-04T08:57:06.4256322Z  * [new branch]              torchtitan_ep               -> origin/torchtitan_ep
2025-12-04T08:57:06.4258022Z  * [new branch]              torchtitan_integration      -> origin/torchtitan_integration
2025-12-04T08:57:06.4259728Z  * [new branch]              trace_fsdp_torchtune_lora   -> origin/trace_fsdp_torchtune_lora
2025-12-04T08:57:06.4261558Z  * [new branch]              traceable_fsdp_unit_tests   -> origin/traceable_fsdp_unit_tests
2025-12-04T08:57:06.4263137Z  * [new branch]              tree_loop_vec_base          -> origin/tree_loop_vec_base
2025-12-04T08:57:06.4264766Z  * [new branch]              triton_kernel               -> origin/triton_kernel
2025-12-04T08:57:06.4266500Z  * [new branch]              tt_pkg_1908                 -> origin/tt_pkg_1908
2025-12-04T08:57:06.4268162Z  * [new branch]              type_dec                    -> origin/type_dec
2025-12-04T08:57:06.4269871Z  * [new branch]              udate-sphinx-dependancies   -> origin/udate-sphinx-dependancies
2025-12-04T08:57:06.4272115Z  * [new branch]              update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1
2025-12-04T08:57:06.4273611Z  * [new branch]              update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1
2025-12-04T08:57:06.4275243Z  * [new branch]              update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1
2025-12-04T08:57:06.4276764Z  * [new branch]              update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1
2025-12-04T08:57:06.4278335Z  * [new branch]              update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1
2025-12-04T08:57:06.4280355Z  * [new branch]              update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1
2025-12-04T08:57:06.4282668Z  * [new branch]              update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2
2025-12-04T08:57:06.4284758Z  * [new branch]              update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1
2025-12-04T08:57:06.4286253Z  * [new branch]              update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1
2025-12-04T08:57:06.4287749Z  * [new branch]              update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1
2025-12-04T08:57:06.4289259Z  * [new branch]              update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1
2025-12-04T08:57:06.4290821Z  * [new branch]              update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1
2025-12-04T08:57:06.4292990Z  * [new branch]              update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1
2025-12-04T08:57:06.4294636Z  * [new branch]              update-vllm-dockerfile      -> origin/update-vllm-dockerfile
2025-12-04T08:57:06.4296872Z  * [new branch]              update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1
2025-12-04T08:57:06.4298438Z  * [new branch]              update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1
2025-12-04T08:57:06.4300035Z  * [new branch]              update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1
2025-12-04T08:57:06.4301830Z  * [new branch]              update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388
2025-12-04T08:57:06.4303450Z  * [new branch]              update_operator_readme      -> origin/update_operator_readme
2025-12-04T08:57:06.4305189Z  * [new branch]              update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736
2025-12-04T08:57:06.4306818Z  * [new branch]              update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173
2025-12-04T08:57:06.4308470Z  * [new branch]              update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677
2025-12-04T08:57:06.4310163Z  * [new branch]              update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283
2025-12-04T08:57:06.4311950Z  * [new branch]              update_submodule_FBGEMM     -> origin/update_submodule_FBGEMM
2025-12-04T08:57:06.4313568Z  * [new branch]              update_submodule_kineto     -> origin/update_submodule_kineto
2025-12-04T08:57:06.4315239Z  * [new branch]              update_submodule_tensorpipe -> origin/update_submodule_tensorpipe
2025-12-04T08:57:06.4316930Z  * [new branch]              upload-tests-for-autorevert -> origin/upload-tests-for-autorevert
2025-12-04T08:57:06.4318904Z  * [new branch]              v0.1.2                      -> origin/v0.1.2
2025-12-04T08:57:06.4320863Z  * [new branch]              v1.0.1                      -> origin/v1.0.1
2025-12-04T08:57:06.4322646Z  * [new branch]              v1.0.3                      -> origin/v1.0.3
2025-12-04T08:57:06.4324345Z  * [new branch]              v1.1.0                      -> origin/v1.1.0
2025-12-04T08:57:06.4326515Z  * [new branch]              v1.2.0                      -> origin/v1.2.0
2025-12-04T08:57:06.4328234Z  * [new branch]              v1.3.0                      -> origin/v1.3.0
2025-12-04T08:57:06.4329992Z  * [new branch]              v1.3.1                      -> origin/v1.3.1
2025-12-04T08:57:06.4331746Z  * [new branch]              validate_fn                 -> origin/validate_fn
2025-12-04T08:57:06.4333585Z  * [new branch]              validations_2.6             -> origin/validations_2.6
2025-12-04T08:57:06.4335330Z  * [new branch]              validations_2.8             -> origin/validations_2.8
2025-12-04T08:57:06.4337028Z  * [new branch]              varlen-api                  -> origin/varlen-api
2025-12-04T08:57:06.4338756Z  * [new branch]              varlen-api-backup           -> origin/varlen-api-backup
2025-12-04T08:57:06.4340361Z  * [new branch]              varlen_batch_invariance     -> origin/varlen_batch_invariance
2025-12-04T08:57:06.4342368Z  * [new branch]              viable/strict               -> origin/viable/strict
2025-12-04T08:57:06.4344672Z  * [new branch]              vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy
2025-12-04T08:57:06.4346369Z  * [new branch]              vllmbuildci                 -> origin/vllmbuildci
2025-12-04T08:57:06.4348078Z  * [new branch]              vllmpin                     -> origin/vllmpin
2025-12-04T08:57:06.4350283Z  * [new branch]              vscode-recommend-pyrefly    -> origin/vscode-recommend-pyrefly
2025-12-04T08:57:06.4352070Z  * [new branch]              wdvr-patch-1                -> origin/wdvr-patch-1
2025-12-04T08:57:06.4354285Z  * [new branch]              wdvr/iss_145259             -> origin/wdvr/iss_145259
2025-12-04T08:57:06.4356448Z  * [new branch]              whc/pei                     -> origin/whc/pei
2025-12-04T08:57:06.4357977Z  * [new branch]              whc/pp_fix                  -> origin/whc/pp_fix
2025-12-04T08:57:06.4359598Z  * [new branch]              whc/sharding                -> origin/whc/sharding
2025-12-04T08:57:06.4361247Z  * [new branch]              whc/sharding2               -> origin/whc/sharding2
2025-12-04T08:57:06.4362703Z  * [new branch]              whc/uneven                  -> origin/whc/uneven
2025-12-04T08:57:06.4364466Z  * [new branch]              whc/uneven-merge            -> origin/whc/uneven-merge
2025-12-04T08:57:06.4366161Z  * [new branch]              win_warnings                -> origin/win_warnings
2025-12-04T08:57:06.4367934Z  * [new branch]              windows_libtorch_free       -> origin/windows_libtorch_free
2025-12-04T08:57:06.4369569Z  * [new branch]              xmfan-war                   -> origin/xmfan-war
2025-12-04T08:57:06.4371738Z  * [new branch]              xmfan/ca_0516               -> origin/xmfan/ca_0516
2025-12-04T08:57:06.4373260Z  * [new branch]              xmfan/ca_1051b93192         -> origin/xmfan/ca_1051b93192
2025-12-04T08:57:06.4374910Z  * [new branch]              xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8
2025-12-04T08:57:06.4376294Z  * [new branch]              xmfan/ca_5a2be192d1         -> origin/xmfan/ca_5a2be192d1
2025-12-04T08:57:06.4377908Z  * [new branch]              xmfan/ca_9d59b516e9         -> origin/xmfan/ca_9d59b516e9
2025-12-04T08:57:06.4379324Z  * [new branch]              xmfan/ca_apr8               -> origin/xmfan/ca_apr8
2025-12-04T08:57:06.4380840Z  * [new branch]              xmfan/ca_base               -> origin/xmfan/ca_base
2025-12-04T08:57:06.4382761Z  * [new branch]              xmfan/ca_dynamic            -> origin/xmfan/ca_dynamic
2025-12-04T08:57:06.4384735Z  * [new branch]              xmfan/ca_fix_dyn            -> origin/xmfan/ca_fix_dyn
2025-12-04T08:57:06.4386366Z  * [new branch]              xmfan/ca_fix_lowering       -> origin/xmfan/ca_fix_lowering
2025-12-04T08:57:06.4388031Z  * [new branch]              xmfan/ca_fix_polyfills      -> origin/xmfan/ca_fix_polyfills
2025-12-04T08:57:06.4389544Z  * [new branch]              xmfan/ca_jan3               -> origin/xmfan/ca_jan3
2025-12-04T08:57:06.4391106Z  * [new branch]              xmfan/ca_jun18              -> origin/xmfan/ca_jun18
2025-12-04T08:57:06.4392735Z  * [new branch]              xmfan/ca_jun24              -> origin/xmfan/ca_jun24
2025-12-04T08:57:06.4394404Z  * [new branch]              xmfan/ca_nested             -> origin/xmfan/ca_nested
2025-12-04T08:57:06.4395992Z  * [new branch]              xmfan/ca_overhead           -> origin/xmfan/ca_overhead
2025-12-04T08:57:06.4397660Z  * [new branch]              xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451
2025-12-04T08:57:06.4399194Z  * [new branch]              xmfan/cacu_jun18            -> origin/xmfan/cacu_jun18
2025-12-04T08:57:06.4400982Z  * [new branch]              xmfan/cacu_jun19            -> origin/xmfan/cacu_jun19
2025-12-04T08:57:06.4403096Z  * [new branch]              xmfan/cacu_jun4             -> origin/xmfan/cacu_jun4
2025-12-04T08:57:06.4404744Z  * [new branch]              xmfan/disable_duck_shape    -> origin/xmfan/disable_duck_shape
2025-12-04T08:57:06.4406441Z  * [new branch]              xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough
2025-12-04T08:57:06.4408193Z  * [new branch]              xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T08:57:06.4409798Z  * [new branch]              xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T08:57:06.4411265Z  * [new branch]              xmfan/single_step           -> origin/xmfan/single_step
2025-12-04T08:57:06.4412875Z  * [new branch]              xmfan/sth_0829              -> origin/xmfan/sth_0829
2025-12-04T08:57:06.4414529Z  * [new branch]              xmfan/test                  -> origin/xmfan/test
2025-12-04T08:57:06.4416737Z  * [new branch]              yguo/debug-0226-constexpr   -> origin/yguo/debug-0226-constexpr
2025-12-04T08:57:06.4419907Z  * [new branch]              yguo/new_latest_changes     -> origin/yguo/new_latest_changes
2025-12-04T08:57:06.4421469Z  * [new branch]              yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes
2025-12-04T08:57:06.4423524Z  * [new branch]              yiming/bootcamp             -> origin/yiming/bootcamp
2025-12-04T08:57:06.4425096Z  * [new branch]              yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop
2025-12-04T08:57:06.4426733Z  * [new branch]              yolo-llama3                 -> origin/yolo-llama3
2025-12-04T08:57:06.4428999Z  * [new branch]              zainr/canary-test           -> origin/zainr/canary-test
2025-12-04T08:57:06.4430664Z  * [new branch]              zainr/cleanup-gh-runners    -> origin/zainr/cleanup-gh-runners
2025-12-04T08:57:06.4432244Z  * [new branch]              zainr/pull-migration-c      -> origin/zainr/pull-migration-c
2025-12-04T08:57:06.4433699Z  * [new branch]              zainr/test2                 -> origin/zainr/test2
2025-12-04T08:57:06.4435509Z  * [new branch]              zasdfgbnm-patch-3           -> origin/zasdfgbnm-patch-3
2025-12-04T08:57:06.4437291Z  * [new branch]              zb2p                        -> origin/zb2p
2025-12-04T08:57:06.4438919Z  * [new branch]              zeros-and-scatter-part2     -> origin/zeros-and-scatter-part2
2025-12-04T08:57:06.4441604Z  * [new branch]              zhxchen17/ci/vllm_lora_oom  -> origin/zhxchen17/ci/vllm_lora_oom
2025-12-04T08:57:06.4443202Z  * [new branch]              zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom
2025-12-04T08:57:06.4444709Z  * [new branch]              zhxchen17/ci/vllm_pin       -> origin/zhxchen17/ci/vllm_pin
2025-12-04T08:57:06.4446893Z  * [new branch]              zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards
2025-12-04T08:57:06.4449021Z  * [new branch]              zhxchen17/export/call_override -> origin/zhxchen17/export/call_override
2025-12-04T08:57:06.4450595Z  * [new branch]              zhxchen17/export/codemod1   -> origin/zhxchen17/export/codemod1
2025-12-04T08:57:06.4452189Z  * [new branch]              zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return
2025-12-04T08:57:06.4453908Z  * [new branch]              zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn
2025-12-04T08:57:06.4455452Z  * [new branch]              zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check
2025-12-04T08:57:06.4457522Z  * [new branch]              zhxchen17/precompile/aoti   -> origin/zhxchen17/precompile/aoti
2025-12-04T08:57:06.4459171Z  * [new branch]              zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals
2025-12-04T08:57:06.4460768Z  * [new branch]              zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards
2025-12-04T08:57:06.4462856Z  * [new branch]              zhxchen17/scratch/0         -> origin/zhxchen17/scratch/0
2025-12-04T08:57:06.4464478Z  * [new branch]              zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update
2025-12-04T08:57:06.4466606Z  * [new branch]              zhxhcen17/moodycamel        -> origin/zhxhcen17/moodycamel
2025-12-04T08:57:06.4468804Z  * [new branch]              zxiiro/build-times          -> origin/zxiiro/build-times
2025-12-04T08:57:06.4470422Z  * [new branch]              zxiiro/c7i.2xlarge          -> origin/zxiiro/c7i.2xlarge
2025-12-04T08:57:06.4472073Z  * [new branch]              zxiiro/c7i.2xlarge.h100     -> origin/zxiiro/c7i.2xlarge.h100
2025-12-04T08:57:06.4473615Z  * [new branch]              zxiiro/main                 -> origin/zxiiro/main
2025-12-04T08:57:06.4475175Z  * [new branch]              zxiiro/risc64               -> origin/zxiiro/risc64
2025-12-04T08:57:06.4476779Z  * [new branch]              zxiiro/test-multicloud-arc  -> origin/zxiiro/test-multicloud-arc
2025-12-04T08:57:06.4478212Z  * [new tag]                 bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug
2025-12-04T08:57:06.4479514Z  * [new tag]                 ci/binaries/77164           -> ci/binaries/77164
2025-12-04T08:57:06.4481109Z  * [new tag]                 ciflow/b200/115316          -> ciflow/b200/115316
2025-12-04T08:57:06.4482269Z  * [new tag]                 ciflow/b200/160685          -> ciflow/b200/160685
2025-12-04T08:57:06.4483324Z  * [new tag]                 ciflow/b200/161607          -> ciflow/b200/161607
2025-12-04T08:57:06.4484483Z  * [new tag]                 ciflow/b200/161938          -> ciflow/b200/161938
2025-12-04T08:57:06.4485725Z  * [new tag]                 ciflow/b200/167207          -> ciflow/b200/167207
2025-12-04T08:57:06.4486716Z  * [new tag]                 ciflow/b200/167989          -> ciflow/b200/167989
2025-12-04T08:57:06.4487960Z  * [new tag]                 ciflow/b200/168096          -> ciflow/b200/168096
2025-12-04T08:57:06.4489096Z  * [new tag]                 ciflow/b200/168175          -> ciflow/b200/168175
2025-12-04T08:57:06.4490092Z  * [new tag]                 ciflow/b200/168195          -> ciflow/b200/168195
2025-12-04T08:57:06.4491626Z  * [new tag]                 ciflow/b200/169200          -> ciflow/b200/169200
2025-12-04T08:57:06.4492497Z  * [new tag]                 ciflow/b200/169216          -> ciflow/b200/169216
2025-12-04T08:57:06.4494233Z  * [new tag]                 ciflow/b200/169380          -> ciflow/b200/169380
2025-12-04T08:57:06.4495929Z  * [new tag]                 ciflow/b200/169412          -> ciflow/b200/169412
2025-12-04T08:57:06.4497443Z  * [new tag]                 ciflow/b200/169470          -> ciflow/b200/169470
2025-12-04T08:57:06.4498456Z  * [new tag]                 ciflow/b200/169471          -> ciflow/b200/169471
2025-12-04T08:57:06.4499723Z  * [new tag]                 ciflow/b200/169472          -> ciflow/b200/169472
2025-12-04T08:57:06.4501169Z  * [new tag]                 ciflow/b200/169514          -> ciflow/b200/169514
2025-12-04T08:57:06.4502177Z  * [new tag]                 ciflow/b200/169517          -> ciflow/b200/169517
2025-12-04T08:57:06.4503691Z  * [new tag]                 ciflow/binaries/165922      -> ciflow/binaries/165922
2025-12-04T08:57:06.4504685Z  * [new tag]                 ciflow/binaries/169510      -> ciflow/binaries/169510
2025-12-04T08:57:06.4506270Z  * [new tag]                 ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994
2025-12-04T08:57:06.4507274Z  * [new tag]                 ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829
2025-12-04T08:57:06.4508319Z  * [new tag]                 ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972
2025-12-04T08:57:06.4509625Z  * [new tag]                 ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981
2025-12-04T08:57:06.4510824Z  * [new tag]                 ciflow/dynamo/167695        -> ciflow/dynamo/167695
2025-12-04T08:57:06.4511879Z  * [new tag]                 ciflow/dynamo/168096        -> ciflow/dynamo/168096
2025-12-04T08:57:06.4513292Z  * [new tag]                 ciflow/dynamo/169525        -> ciflow/dynamo/169525
2025-12-04T08:57:06.4514651Z  * [new tag]                 ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938
2025-12-04T08:57:06.4515581Z  * [new tag]                 ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940
2025-12-04T08:57:06.4517168Z  * [new tag]                 ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923
2025-12-04T08:57:06.4518434Z  * [new tag]                 ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552
2025-12-04T08:57:06.4519397Z  * [new tag]                 ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129
2025-12-04T08:57:06.4520498Z  * [new tag]                 ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917
2025-12-04T08:57:06.4521998Z  * [new tag]                 ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156
2025-12-04T08:57:06.4522931Z  * [new tag]                 ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200
2025-12-04T08:57:06.4523960Z  * [new tag]                 ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216
2025-12-04T08:57:06.4525155Z  * [new tag]                 ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338
2025-12-04T08:57:06.4526741Z  * [new tag]                 ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355
2025-12-04T08:57:06.4527706Z  * [new tag]                 ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543
2025-12-04T08:57:06.4529049Z  * [new tag]                 ciflow/h100/115316          -> ciflow/h100/115316
2025-12-04T08:57:06.4530040Z  * [new tag]                 ciflow/h100/160685          -> ciflow/h100/160685
2025-12-04T08:57:06.4531270Z  * [new tag]                 ciflow/h100/160729          -> ciflow/h100/160729
2025-12-04T08:57:06.4532189Z  * [new tag]                 ciflow/h100/161607          -> ciflow/h100/161607
2025-12-04T08:57:06.4533359Z  * [new tag]                 ciflow/h100/161938          -> ciflow/h100/161938
2025-12-04T08:57:06.4534307Z  * [new tag]                 ciflow/h100/167207          -> ciflow/h100/167207
2025-12-04T08:57:06.4535641Z  * [new tag]                 ciflow/h100/167989          -> ciflow/h100/167989
2025-12-04T08:57:06.4536421Z  * [new tag]                 ciflow/h100/168096          -> ciflow/h100/168096
2025-12-04T08:57:06.4537417Z  * [new tag]                 ciflow/h100/168175          -> ciflow/h100/168175
2025-12-04T08:57:06.4538646Z  * [new tag]                 ciflow/h100/168195          -> ciflow/h100/168195
2025-12-04T08:57:06.4539510Z  * [new tag]                 ciflow/h100/168980          -> ciflow/h100/168980
2025-12-04T08:57:06.4540944Z  * [new tag]                 ciflow/h100/169200          -> ciflow/h100/169200
2025-12-04T08:57:06.4542360Z  * [new tag]                 ciflow/h100/169216          -> ciflow/h100/169216
2025-12-04T08:57:06.4543821Z  * [new tag]                 ciflow/h100/169380          -> ciflow/h100/169380
2025-12-04T08:57:06.4544752Z  * [new tag]                 ciflow/h100/169412          -> ciflow/h100/169412
2025-12-04T08:57:06.4545987Z  * [new tag]                 ciflow/h100/169470          -> ciflow/h100/169470
2025-12-04T08:57:06.4547081Z  * [new tag]                 ciflow/h100/169471          -> ciflow/h100/169471
2025-12-04T08:57:06.4548228Z  * [new tag]                 ciflow/h100/169472          -> ciflow/h100/169472
2025-12-04T08:57:06.4549350Z  * [new tag]                 ciflow/h100/169514          -> ciflow/h100/169514
2025-12-04T08:57:06.4550684Z  * [new tag]                 ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096
2025-12-04T08:57:06.4552240Z  * [new tag]                 ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096
2025-12-04T08:57:06.4553513Z  * [new tag]                 ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165
2025-12-04T08:57:06.4554399Z  * [new tag]                 ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096
2025-12-04T08:57:06.4556311Z  * [new tag]                 ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096
2025-12-04T08:57:06.4558017Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073
2025-12-04T08:57:06.4559026Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096
2025-12-04T08:57:06.4560354Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024
2025-12-04T08:57:06.4561666Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024
2025-12-04T08:57:06.4562958Z  * [new tag]                 ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096
2025-12-04T08:57:06.4564216Z  * [new tag]                 ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096
2025-12-04T08:57:06.4565255Z  * [new tag]                 ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024
2025-12-04T08:57:06.4566461Z  * [new tag]                 ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425
2025-12-04T08:57:06.4567814Z  * [new tag]                 ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545
2025-12-04T08:57:06.4569025Z  * [new tag]                 ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997
2025-12-04T08:57:06.4570040Z  * [new tag]                 ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096
2025-12-04T08:57:06.4571251Z  * [new tag]                 ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063
2025-12-04T08:57:06.4572296Z  * [new tag]                 ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425
2025-12-04T08:57:06.4573552Z  * [new tag]                 ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545
2025-12-04T08:57:06.4574685Z  * [new tag]                 ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096
2025-12-04T08:57:06.4575528Z  * [new tag]                 ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063
2025-12-04T08:57:06.4576784Z  * [new tag]                 ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425
2025-12-04T08:57:06.4578048Z  * [new tag]                 ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052
2025-12-04T08:57:06.4579079Z  * [new tag]                 ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971
2025-12-04T08:57:06.4580413Z  * [new tag]                 ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096
2025-12-04T08:57:06.4581642Z  * [new tag]                 ciflow/inductor/144542      -> ciflow/inductor/144542
2025-12-04T08:57:06.4582748Z  * [new tag]                 ciflow/inductor/146506      -> ciflow/inductor/146506
2025-12-04T08:57:06.4583879Z  * [new tag]                 ciflow/inductor/147990      -> ciflow/inductor/147990
2025-12-04T08:57:06.4585059Z  * [new tag]                 ciflow/inductor/148294      -> ciflow/inductor/148294
2025-12-04T08:57:06.4586070Z  * [new tag]                 ciflow/inductor/148492      -> ciflow/inductor/148492
2025-12-04T08:57:06.4587152Z  * [new tag]                 ciflow/inductor/157149      -> ciflow/inductor/157149
2025-12-04T08:57:06.4588240Z  * [new tag]                 ciflow/inductor/157994      -> ciflow/inductor/157994
2025-12-04T08:57:06.4589229Z  * [new tag]                 ciflow/inductor/160685      -> ciflow/inductor/160685
2025-12-04T08:57:06.4590310Z  * [new tag]                 ciflow/inductor/160686      -> ciflow/inductor/160686
2025-12-04T08:57:06.4591844Z  * [new tag]                 ciflow/inductor/160687      -> ciflow/inductor/160687
2025-12-04T08:57:06.4592964Z  * [new tag]                 ciflow/inductor/160688      -> ciflow/inductor/160688
2025-12-04T08:57:06.4594270Z  * [new tag]                 ciflow/inductor/160706      -> ciflow/inductor/160706
2025-12-04T08:57:06.4595726Z  * [new tag]                 ciflow/inductor/160729      -> ciflow/inductor/160729
2025-12-04T08:57:06.4597114Z  * [new tag]                 ciflow/inductor/161938      -> ciflow/inductor/161938
2025-12-04T08:57:06.4598284Z  * [new tag]                 ciflow/inductor/161939      -> ciflow/inductor/161939
2025-12-04T08:57:06.4599460Z  * [new tag]                 ciflow/inductor/161940      -> ciflow/inductor/161940
2025-12-04T08:57:06.4600724Z  * [new tag]                 ciflow/inductor/162052      -> ciflow/inductor/162052
2025-12-04T08:57:06.4601831Z  * [new tag]                 ciflow/inductor/162275      -> ciflow/inductor/162275
2025-12-04T08:57:06.4602944Z  * [new tag]                 ciflow/inductor/162795      -> ciflow/inductor/162795
2025-12-04T08:57:06.4604219Z  * [new tag]                 ciflow/inductor/163245      -> ciflow/inductor/163245
2025-12-04T08:57:06.4605410Z  * [new tag]                 ciflow/inductor/163335      -> ciflow/inductor/163335
2025-12-04T08:57:06.4606494Z  * [new tag]                 ciflow/inductor/163503      -> ciflow/inductor/163503
2025-12-04T08:57:06.4607627Z  * [new tag]                 ciflow/inductor/163942      -> ciflow/inductor/163942
2025-12-04T08:57:06.4608849Z  * [new tag]                 ciflow/inductor/165270      -> ciflow/inductor/165270
2025-12-04T08:57:06.4610038Z  * [new tag]                 ciflow/inductor/165274      -> ciflow/inductor/165274
2025-12-04T08:57:06.4611120Z  * [new tag]                 ciflow/inductor/165322      -> ciflow/inductor/165322
2025-12-04T08:57:06.4612247Z  * [new tag]                 ciflow/inductor/165597      -> ciflow/inductor/165597
2025-12-04T08:57:06.4613390Z  * [new tag]                 ciflow/inductor/166063      -> ciflow/inductor/166063
2025-12-04T08:57:06.4614585Z  * [new tag]                 ciflow/inductor/166075      -> ciflow/inductor/166075
2025-12-04T08:57:06.4615709Z  * [new tag]                 ciflow/inductor/166165      -> ciflow/inductor/166165
2025-12-04T08:57:06.4617247Z  * [new tag]                 ciflow/inductor/166254      -> ciflow/inductor/166254
2025-12-04T08:57:06.4618436Z  * [new tag]                 ciflow/inductor/166483      -> ciflow/inductor/166483
2025-12-04T08:57:06.4619427Z  * [new tag]                 ciflow/inductor/166494      -> ciflow/inductor/166494
2025-12-04T08:57:06.4620623Z  * [new tag]                 ciflow/inductor/166545      -> ciflow/inductor/166545
2025-12-04T08:57:06.4621750Z  * [new tag]                 ciflow/inductor/166788      -> ciflow/inductor/166788
2025-12-04T08:57:06.4622971Z  * [new tag]                 ciflow/inductor/166846      -> ciflow/inductor/166846
2025-12-04T08:57:06.4624135Z  * [new tag]                 ciflow/inductor/167300      -> ciflow/inductor/167300
2025-12-04T08:57:06.4625234Z  * [new tag]                 ciflow/inductor/167407      -> ciflow/inductor/167407
2025-12-04T08:57:06.4626443Z  * [new tag]                 ciflow/inductor/167536      -> ciflow/inductor/167536
2025-12-04T08:57:06.4627610Z  * [new tag]                 ciflow/inductor/167552      -> ciflow/inductor/167552
2025-12-04T08:57:06.4628747Z  * [new tag]                 ciflow/inductor/167555      -> ciflow/inductor/167555
2025-12-04T08:57:06.4630146Z  * [new tag]                 ciflow/inductor/167583      -> ciflow/inductor/167583
2025-12-04T08:57:06.4631375Z  * [new tag]                 ciflow/inductor/167599      -> ciflow/inductor/167599
2025-12-04T08:57:06.4632555Z  * [new tag]                 ciflow/inductor/167647      -> ciflow/inductor/167647
2025-12-04T08:57:06.4633718Z  * [new tag]                 ciflow/inductor/167677      -> ciflow/inductor/167677
2025-12-04T08:57:06.4634815Z  * [new tag]                 ciflow/inductor/167680      -> ciflow/inductor/167680
2025-12-04T08:57:06.4635950Z  * [new tag]                 ciflow/inductor/167695      -> ciflow/inductor/167695
2025-12-04T08:57:06.4637087Z  * [new tag]                 ciflow/inductor/167742      -> ciflow/inductor/167742
2025-12-04T08:57:06.4638229Z  * [new tag]                 ciflow/inductor/167768      -> ciflow/inductor/167768
2025-12-04T08:57:06.4639568Z  * [new tag]                 ciflow/inductor/167773      -> ciflow/inductor/167773
2025-12-04T08:57:06.4640833Z  * [new tag]                 ciflow/inductor/167781      -> ciflow/inductor/167781
2025-12-04T08:57:06.4642039Z  * [new tag]                 ciflow/inductor/167880      -> ciflow/inductor/167880
2025-12-04T08:57:06.4643220Z  * [new tag]                 ciflow/inductor/167887      -> ciflow/inductor/167887
2025-12-04T08:57:06.4644319Z  * [new tag]                 ciflow/inductor/167972      -> ciflow/inductor/167972
2025-12-04T08:57:06.4645417Z  * [new tag]                 ciflow/inductor/167989      -> ciflow/inductor/167989
2025-12-04T08:57:06.4646573Z  * [new tag]                 ciflow/inductor/168002      -> ciflow/inductor/168002
2025-12-04T08:57:06.4647699Z  * [new tag]                 ciflow/inductor/168050      -> ciflow/inductor/168050
2025-12-04T08:57:06.4648827Z  * [new tag]                 ciflow/inductor/168051      -> ciflow/inductor/168051
2025-12-04T08:57:06.4649986Z  * [new tag]                 ciflow/inductor/168052      -> ciflow/inductor/168052
2025-12-04T08:57:06.4651124Z  * [new tag]                 ciflow/inductor/168073      -> ciflow/inductor/168073
2025-12-04T08:57:06.4652293Z  * [new tag]                 ciflow/inductor/168096      -> ciflow/inductor/168096
2025-12-04T08:57:06.4653420Z  * [new tag]                 ciflow/inductor/168114      -> ciflow/inductor/168114
2025-12-04T08:57:06.4654620Z  * [new tag]                 ciflow/inductor/168115      -> ciflow/inductor/168115
2025-12-04T08:57:06.4655741Z  * [new tag]                 ciflow/inductor/168127      -> ciflow/inductor/168127
2025-12-04T08:57:06.4656860Z  * [new tag]                 ciflow/inductor/168129      -> ciflow/inductor/168129
2025-12-04T08:57:06.4657980Z  * [new tag]                 ciflow/inductor/168157      -> ciflow/inductor/168157
2025-12-04T08:57:06.4659093Z  * [new tag]                 ciflow/inductor/168175      -> ciflow/inductor/168175
2025-12-04T08:57:06.4660320Z  * [new tag]                 ciflow/inductor/168185      -> ciflow/inductor/168185
2025-12-04T08:57:06.4661235Z  * [new tag]                 ciflow/inductor/168195      -> ciflow/inductor/168195
2025-12-04T08:57:06.4662458Z  * [new tag]                 ciflow/inductor/168209      -> ciflow/inductor/168209
2025-12-04T08:57:06.4663606Z  * [new tag]                 ciflow/inductor/168266      -> ciflow/inductor/168266
2025-12-04T08:57:06.4664726Z  * [new tag]                 ciflow/inductor/168316      -> ciflow/inductor/168316
2025-12-04T08:57:06.4665997Z  * [new tag]                 ciflow/inductor/168326      -> ciflow/inductor/168326
2025-12-04T08:57:06.4667619Z  * [new tag]                 ciflow/inductor/168368      -> ciflow/inductor/168368
2025-12-04T08:57:06.4668801Z  * [new tag]                 ciflow/inductor/168894      -> ciflow/inductor/168894
2025-12-04T08:57:06.4670002Z  * [new tag]                 ciflow/inductor/168934      -> ciflow/inductor/168934
2025-12-04T08:57:06.4671113Z  * [new tag]                 ciflow/inductor/168939      -> ciflow/inductor/168939
2025-12-04T08:57:06.4672236Z  * [new tag]                 ciflow/inductor/168946      -> ciflow/inductor/168946
2025-12-04T08:57:06.4673372Z  * [new tag]                 ciflow/inductor/168950      -> ciflow/inductor/168950
2025-12-04T08:57:06.4674551Z  * [new tag]                 ciflow/inductor/168951      -> ciflow/inductor/168951
2025-12-04T08:57:06.4675624Z  * [new tag]                 ciflow/inductor/168952      -> ciflow/inductor/168952
2025-12-04T08:57:06.4676775Z  * [new tag]                 ciflow/inductor/168955      -> ciflow/inductor/168955
2025-12-04T08:57:06.4677896Z  * [new tag]                 ciflow/inductor/168971      -> ciflow/inductor/168971
2025-12-04T08:57:06.4679079Z  * [new tag]                 ciflow/inductor/168979      -> ciflow/inductor/168979
2025-12-04T08:57:06.4680220Z  * [new tag]                 ciflow/inductor/168980      -> ciflow/inductor/168980
2025-12-04T08:57:06.4681490Z  * [new tag]                 ciflow/inductor/168983      -> ciflow/inductor/168983
2025-12-04T08:57:06.4682676Z  * [new tag]                 ciflow/inductor/169006      -> ciflow/inductor/169006
2025-12-04T08:57:06.4683858Z  * [new tag]                 ciflow/inductor/169023      -> ciflow/inductor/169023
2025-12-04T08:57:06.4684948Z  * [new tag]                 ciflow/inductor/169024      -> ciflow/inductor/169024
2025-12-04T08:57:06.4686108Z  * [new tag]                 ciflow/inductor/169025      -> ciflow/inductor/169025
2025-12-04T08:57:06.4687247Z  * [new tag]                 ciflow/inductor/169066      -> ciflow/inductor/169066
2025-12-04T08:57:06.4688437Z  * [new tag]                 ciflow/inductor/169091      -> ciflow/inductor/169091
2025-12-04T08:57:06.4689572Z  * [new tag]                 ciflow/inductor/169102      -> ciflow/inductor/169102
2025-12-04T08:57:06.4690709Z  * [new tag]                 ciflow/inductor/169103      -> ciflow/inductor/169103
2025-12-04T08:57:06.4691843Z  * [new tag]                 ciflow/inductor/169121      -> ciflow/inductor/169121
2025-12-04T08:57:06.4693000Z  * [new tag]                 ciflow/inductor/169134      -> ciflow/inductor/169134
2025-12-04T08:57:06.4694087Z  * [new tag]                 ciflow/inductor/169135      -> ciflow/inductor/169135
2025-12-04T08:57:06.4695215Z  * [new tag]                 ciflow/inductor/169141      -> ciflow/inductor/169141
2025-12-04T08:57:06.4696340Z  * [new tag]                 ciflow/inductor/169151      -> ciflow/inductor/169151
2025-12-04T08:57:06.4697685Z  * [new tag]                 ciflow/inductor/169161      -> ciflow/inductor/169161
2025-12-04T08:57:06.4698889Z  * [new tag]                 ciflow/inductor/169167      -> ciflow/inductor/169167
2025-12-04T08:57:06.4700155Z  * [new tag]                 ciflow/inductor/169177      -> ciflow/inductor/169177
2025-12-04T08:57:06.4701409Z  * [new tag]                 ciflow/inductor/169185      -> ciflow/inductor/169185
2025-12-04T08:57:06.4702606Z  * [new tag]                 ciflow/inductor/169196      -> ciflow/inductor/169196
2025-12-04T08:57:06.4703936Z  * [new tag]                 ciflow/inductor/169200      -> ciflow/inductor/169200
2025-12-04T08:57:06.4704818Z  * [new tag]                 ciflow/inductor/169204      -> ciflow/inductor/169204
2025-12-04T08:57:06.4706072Z  * [new tag]                 ciflow/inductor/169216      -> ciflow/inductor/169216
2025-12-04T08:57:06.4707215Z  * [new tag]                 ciflow/inductor/169219      -> ciflow/inductor/169219
2025-12-04T08:57:06.4708323Z  * [new tag]                 ciflow/inductor/169220      -> ciflow/inductor/169220
2025-12-04T08:57:06.4709533Z  * [new tag]                 ciflow/inductor/169230      -> ciflow/inductor/169230
2025-12-04T08:57:06.4710706Z  * [new tag]                 ciflow/inductor/169242      -> ciflow/inductor/169242
2025-12-04T08:57:06.4711850Z  * [new tag]                 ciflow/inductor/169245      -> ciflow/inductor/169245
2025-12-04T08:57:06.4713061Z  * [new tag]                 ciflow/inductor/169260      -> ciflow/inductor/169260
2025-12-04T08:57:06.4714239Z  * [new tag]                 ciflow/inductor/169282      -> ciflow/inductor/169282
2025-12-04T08:57:06.4715396Z  * [new tag]                 ciflow/inductor/169286      -> ciflow/inductor/169286
2025-12-04T08:57:06.4716542Z  * [new tag]                 ciflow/inductor/169299      -> ciflow/inductor/169299
2025-12-04T08:57:06.4717983Z  * [new tag]                 ciflow/inductor/169304      -> ciflow/inductor/169304
2025-12-04T08:57:06.4719470Z  * [new tag]                 ciflow/inductor/169305      -> ciflow/inductor/169305
2025-12-04T08:57:06.4720766Z  * [new tag]                 ciflow/inductor/169308      -> ciflow/inductor/169308
2025-12-04T08:57:06.4721879Z  * [new tag]                 ciflow/inductor/169319      -> ciflow/inductor/169319
2025-12-04T08:57:06.4722991Z  * [new tag]                 ciflow/inductor/169326      -> ciflow/inductor/169326
2025-12-04T08:57:06.4724148Z  * [new tag]                 ciflow/inductor/169332      -> ciflow/inductor/169332
2025-12-04T08:57:06.4725348Z  * [new tag]                 ciflow/inductor/169333      -> ciflow/inductor/169333
2025-12-04T08:57:06.4726619Z  * [new tag]                 ciflow/inductor/169336      -> ciflow/inductor/169336
2025-12-04T08:57:06.4727831Z  * [new tag]                 ciflow/inductor/169340      -> ciflow/inductor/169340
2025-12-04T08:57:06.4728978Z  * [new tag]                 ciflow/inductor/169341      -> ciflow/inductor/169341
2025-12-04T08:57:06.4730175Z  * [new tag]                 ciflow/inductor/169343      -> ciflow/inductor/169343
2025-12-04T08:57:06.4731365Z  * [new tag]                 ciflow/inductor/169346      -> ciflow/inductor/169346
2025-12-04T08:57:06.4732579Z  * [new tag]                 ciflow/inductor/169348      -> ciflow/inductor/169348
2025-12-04T08:57:06.4733908Z  * [new tag]                 ciflow/inductor/169350      -> ciflow/inductor/169350
2025-12-04T08:57:06.4735095Z  * [new tag]                 ciflow/inductor/169355      -> ciflow/inductor/169355
2025-12-04T08:57:06.4736254Z  * [new tag]                 ciflow/inductor/169370      -> ciflow/inductor/169370
2025-12-04T08:57:06.4737566Z  * [new tag]                 ciflow/inductor/169375      -> ciflow/inductor/169375
2025-12-04T08:57:06.4738767Z  * [new tag]                 ciflow/inductor/169389      -> ciflow/inductor/169389
2025-12-04T08:57:06.4739950Z  * [new tag]                 ciflow/inductor/169391      -> ciflow/inductor/169391
2025-12-04T08:57:06.4741083Z  * [new tag]                 ciflow/inductor/169393      -> ciflow/inductor/169393
2025-12-04T08:57:06.4742262Z  * [new tag]                 ciflow/inductor/169399      -> ciflow/inductor/169399
2025-12-04T08:57:06.4743954Z  * [new tag]                 ciflow/inductor/169400      -> ciflow/inductor/169400
2025-12-04T08:57:06.4745168Z  * [new tag]                 ciflow/inductor/169415      -> ciflow/inductor/169415
2025-12-04T08:57:06.4746312Z  * [new tag]                 ciflow/inductor/169417      -> ciflow/inductor/169417
2025-12-04T08:57:06.4747596Z  * [new tag]                 ciflow/inductor/169418      -> ciflow/inductor/169418
2025-12-04T08:57:06.4748840Z  * [new tag]                 ciflow/inductor/169430      -> ciflow/inductor/169430
2025-12-04T08:57:06.4749996Z  * [new tag]                 ciflow/inductor/169432      -> ciflow/inductor/169432
2025-12-04T08:57:06.4751092Z  * [new tag]                 ciflow/inductor/169436      -> ciflow/inductor/169436
2025-12-04T08:57:06.4752300Z  * [new tag]                 ciflow/inductor/169437      -> ciflow/inductor/169437
2025-12-04T08:57:06.4753503Z  * [new tag]                 ciflow/inductor/169438      -> ciflow/inductor/169438
2025-12-04T08:57:06.4754676Z  * [new tag]                 ciflow/inductor/169441      -> ciflow/inductor/169441
2025-12-04T08:57:06.4755864Z  * [new tag]                 ciflow/inductor/169446      -> ciflow/inductor/169446
2025-12-04T08:57:06.4757275Z  * [new tag]                 ciflow/inductor/169447      -> ciflow/inductor/169447
2025-12-04T08:57:06.4758391Z  * [new tag]                 ciflow/inductor/169452      -> ciflow/inductor/169452
2025-12-04T08:57:06.4771673Z  * [new tag]                 ciflow/inductor/169455      -> ciflow/inductor/169455
2025-12-04T08:57:06.4772163Z  * [new tag]                 ciflow/inductor/169459      -> ciflow/inductor/169459
2025-12-04T08:57:06.4772520Z  * [new tag]                 ciflow/inductor/169463      -> ciflow/inductor/169463
2025-12-04T08:57:06.4772894Z  * [new tag]                 ciflow/inductor/169476      -> ciflow/inductor/169476
2025-12-04T08:57:06.4773293Z  * [new tag]                 ciflow/inductor/169485      -> ciflow/inductor/169485
2025-12-04T08:57:06.4773754Z  * [new tag]                 ciflow/inductor/169493      -> ciflow/inductor/169493
2025-12-04T08:57:06.4774124Z  * [new tag]                 ciflow/inductor/169496      -> ciflow/inductor/169496
2025-12-04T08:57:06.4774449Z  * [new tag]                 ciflow/inductor/169497      -> ciflow/inductor/169497
2025-12-04T08:57:06.4774783Z  * [new tag]                 ciflow/inductor/169503      -> ciflow/inductor/169503
2025-12-04T08:57:06.4775109Z  * [new tag]                 ciflow/inductor/169504      -> ciflow/inductor/169504
2025-12-04T08:57:06.4775434Z  * [new tag]                 ciflow/inductor/169505      -> ciflow/inductor/169505
2025-12-04T08:57:06.4775760Z  * [new tag]                 ciflow/inductor/169508      -> ciflow/inductor/169508
2025-12-04T08:57:06.4776083Z  * [new tag]                 ciflow/inductor/169509      -> ciflow/inductor/169509
2025-12-04T08:57:06.4776404Z  * [new tag]                 ciflow/inductor/169513      -> ciflow/inductor/169513
2025-12-04T08:57:06.4777019Z  * [new tag]                 ciflow/inductor/169514      -> ciflow/inductor/169514
2025-12-04T08:57:06.4777926Z  * [new tag]                 ciflow/inductor/169515      -> ciflow/inductor/169515
2025-12-04T08:57:06.4778961Z  * [new tag]                 ciflow/inductor/169517      -> ciflow/inductor/169517
2025-12-04T08:57:06.4780053Z  * [new tag]                 ciflow/inductor/169519      -> ciflow/inductor/169519
2025-12-04T08:57:06.4781176Z  * [new tag]                 ciflow/inductor/169520      -> ciflow/inductor/169520
2025-12-04T08:57:06.4782389Z  * [new tag]                 ciflow/inductor/169521      -> ciflow/inductor/169521
2025-12-04T08:57:06.4783530Z  * [new tag]                 ciflow/inductor/169524      -> ciflow/inductor/169524
2025-12-04T08:57:06.4784666Z  * [new tag]                 ciflow/inductor/169527      -> ciflow/inductor/169527
2025-12-04T08:57:06.4785826Z  * [new tag]                 ciflow/inductor/169528      -> ciflow/inductor/169528
2025-12-04T08:57:06.4787166Z  * [new tag]                 ciflow/inductor/169532      -> ciflow/inductor/169532
2025-12-04T08:57:06.4788235Z  * [new tag]                 ciflow/inductor/169535      -> ciflow/inductor/169535
2025-12-04T08:57:06.4789342Z  * [new tag]                 ciflow/inductor/169536      -> ciflow/inductor/169536
2025-12-04T08:57:06.4790497Z  * [new tag]                 ciflow/inductor/169547      -> ciflow/inductor/169547
2025-12-04T08:57:06.4791878Z  * [new tag]                 ciflow/inductor/169548      -> ciflow/inductor/169548
2025-12-04T08:57:06.4792898Z  * [new tag]                 ciflow/inductor/169549      -> ciflow/inductor/169549
2025-12-04T08:57:06.4794021Z  * [new tag]                 ciflow/inductor/169551      -> ciflow/inductor/169551
2025-12-04T08:57:06.4795186Z  * [new tag]                 ciflow/inductor/169552      -> ciflow/inductor/169552
2025-12-04T08:57:06.4796633Z  * [new tag]                 ciflow/inductor/169553      -> ciflow/inductor/169553
2025-12-04T08:57:06.4797753Z  * [new tag]                 ciflow/inductor/3b9a386     -> ciflow/inductor/3b9a386
2025-12-04T08:57:06.4799193Z  * [new tag]                 ciflow/inductor/3d4b92b     -> ciflow/inductor/3d4b92b
2025-12-04T08:57:06.4800447Z  * [new tag]                 ciflow/inductor/d224ac7     -> ciflow/inductor/d224ac7
2025-12-04T08:57:06.4801946Z  * [new tag]                 ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994
2025-12-04T08:57:06.4802895Z  * [new tag]                 ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075
2025-12-04T08:57:06.4803932Z  * [new tag]                 ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876
2025-12-04T08:57:06.4805001Z  * [new tag]                 ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981
2025-12-04T08:57:06.4806503Z  * [new tag]                 ciflow/mps/166254           -> ciflow/mps/166254
2025-12-04T08:57:06.4807296Z  * [new tag]                 ciflow/mps/169017           -> ciflow/mps/169017
2025-12-04T08:57:06.4808591Z  * [new tag]                 ciflow/mps/169372           -> ciflow/mps/169372
2025-12-04T08:57:06.4809596Z  * [new tag]                 ciflow/mps/169478           -> ciflow/mps/169478
2025-12-04T08:57:06.4811082Z  * [new tag]                 ciflow/op-benchmark/157994  -> ciflow/op-benchmark/157994
2025-12-04T08:57:06.4812016Z  * [new tag]                 ciflow/op-benchmark/166075  -> ciflow/op-benchmark/166075
2025-12-04T08:57:06.4813041Z  * [new tag]                 ciflow/op-benchmark/169544  -> ciflow/op-benchmark/169544
2025-12-04T08:57:06.4814528Z  * [new tag]                 ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997
2025-12-04T08:57:06.4815652Z  * [new tag]                 ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517
2025-12-04T08:57:06.4816730Z  * [new tag]                 ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063
2025-12-04T08:57:06.4819583Z  * [new tag]                 ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425
2025-12-04T08:57:06.4820801Z  * [new tag]                 ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517
2025-12-04T08:57:06.4821988Z  * [new tag]                 ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063
2025-12-04T08:57:06.4823061Z  * [new tag]                 ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425
2025-12-04T08:57:06.4824601Z  * [new tag]                 ciflow/periodic/054a2fd     -> ciflow/periodic/054a2fd
2025-12-04T08:57:06.4825618Z  * [new tag]                 ciflow/periodic/167207      -> ciflow/periodic/167207
2025-12-04T08:57:06.4826756Z  * [new tag]                 ciflow/periodic/167978      -> ciflow/periodic/167978
2025-12-04T08:57:06.4827819Z  * [new tag]                 ciflow/periodic/168096      -> ciflow/periodic/168096
2025-12-04T08:57:06.4828884Z  * [new tag]                 ciflow/periodic/169286      -> ciflow/periodic/169286
2025-12-04T08:57:06.4830229Z  * [new tag]                 ciflow/periodic/2a6d37d     -> ciflow/periodic/2a6d37d
2025-12-04T08:57:06.4831354Z  * [new tag]                 ciflow/periodic/317eeb8     -> ciflow/periodic/317eeb8
2025-12-04T08:57:06.4832531Z  * [new tag]                 ciflow/periodic/3c32        -> ciflow/periodic/3c32
2025-12-04T08:57:06.4833900Z  * [new tag]                 ciflow/periodic/3e98831     -> ciflow/periodic/3e98831
2025-12-04T08:57:06.4835833Z  * [new tag]                 ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T08:57:06.4836977Z  * [new tag]                 ciflow/periodic/94512-point -> ciflow/periodic/94512-point
2025-12-04T08:57:06.4838642Z  * [new tag]                 ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519
2025-12-04T08:57:06.4839850Z  * [new tag]                 ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275
2025-12-04T08:57:06.4841413Z  * [new tag]                 ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761
2025-12-04T08:57:06.4842627Z  * [new tag]                 ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12
2025-12-04T08:57:06.4844114Z  * [new tag]                 ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0
2025-12-04T08:57:06.4845551Z  * [new tag]                 ciflow/periodic/sha-ec5b83  -> ciflow/periodic/sha-ec5b83
2025-12-04T08:57:06.4846684Z  * [new tag]                 ciflow/pull/167207          -> ciflow/pull/167207
2025-12-04T08:57:06.4848328Z  * [new tag]                 ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207
2025-12-04T08:57:06.4849481Z  * [new tag]                 ciflow/rocm-mi200/165545    -> ciflow/rocm-mi200/165545
2025-12-04T08:57:06.4850638Z  * [new tag]                 ciflow/rocm-mi200/165997    -> ciflow/rocm-mi200/165997
2025-12-04T08:57:06.4851730Z  * [new tag]                 ciflow/rocm-mi200/168096    -> ciflow/rocm-mi200/168096
2025-12-04T08:57:06.4852909Z  * [new tag]                 ciflow/rocm-mi200/168275    -> ciflow/rocm-mi200/168275
2025-12-04T08:57:06.4853962Z  * [new tag]                 ciflow/rocm-mi200/169063    -> ciflow/rocm-mi200/169063
2025-12-04T08:57:06.4855648Z  * [new tag]                 ciflow/rocm-mi200/169356    -> ciflow/rocm-mi200/169356
2025-12-04T08:57:06.4856690Z  * [new tag]                 ciflow/rocm-mi200/169425    -> ciflow/rocm-mi200/169425
2025-12-04T08:57:06.4858071Z  * [new tag]                 ciflow/rocm-mi300/165545    -> ciflow/rocm-mi300/165545
2025-12-04T08:57:06.4859208Z  * [new tag]                 ciflow/rocm-mi300/167157    -> ciflow/rocm-mi300/167157
2025-12-04T08:57:06.4860282Z  * [new tag]                 ciflow/rocm-mi300/168096    -> ciflow/rocm-mi300/168096
2025-12-04T08:57:06.4861890Z  * [new tag]                 ciflow/rocm-mi300/169063    -> ciflow/rocm-mi300/169063
2025-12-04T08:57:06.4862947Z  * [new tag]                 ciflow/rocm-mi300/169425    -> ciflow/rocm-mi300/169425
2025-12-04T08:57:06.4864284Z  * [new tag]                 ciflow/rocm-mi355/167157    -> ciflow/rocm-mi355/167157
2025-12-04T08:57:06.4865292Z  * [new tag]                 ciflow/rocm-mi355/168275    -> ciflow/rocm-mi355/168275
2025-12-04T08:57:06.4866415Z  * [new tag]                 ciflow/rocm-mi355/169425    -> ciflow/rocm-mi355/169425
2025-12-04T08:57:06.4867862Z  * [new tag]                 ciflow/rocm-navi31/168275   -> ciflow/rocm-navi31/168275
2025-12-04T08:57:06.4868912Z  * [new tag]                 ciflow/rocm-navi31/169425   -> ciflow/rocm-navi31/169425
2025-12-04T08:57:06.4870235Z  * [new tag]                 ciflow/rocm/115316          -> ciflow/rocm/115316
2025-12-04T08:57:06.4871265Z  * [new tag]                 ciflow/rocm/148492          -> ciflow/rocm/148492
2025-12-04T08:57:06.4872273Z  * [new tag]                 ciflow/rocm/160685          -> ciflow/rocm/160685
2025-12-04T08:57:06.4873327Z  * [new tag]                 ciflow/rocm/161607          -> ciflow/rocm/161607
2025-12-04T08:57:06.4874372Z  * [new tag]                 ciflow/rocm/162052          -> ciflow/rocm/162052
2025-12-04T08:57:06.4875441Z  * [new tag]                 ciflow/rocm/165997          -> ciflow/rocm/165997
2025-12-04T08:57:06.4876471Z  * [new tag]                 ciflow/rocm/166165          -> ciflow/rocm/166165
2025-12-04T08:57:06.4877541Z  * [new tag]                 ciflow/rocm/166517          -> ciflow/rocm/166517
2025-12-04T08:57:06.4878581Z  * [new tag]                 ciflow/rocm/167207          -> ciflow/rocm/167207
2025-12-04T08:57:06.4879750Z  * [new tag]                 ciflow/rocm/167536          -> ciflow/rocm/167536
2025-12-04T08:57:06.4880839Z  * [new tag]                 ciflow/rocm/167781          -> ciflow/rocm/167781
2025-12-04T08:57:06.4882235Z  * [new tag]                 ciflow/rocm/167989          -> ciflow/rocm/167989
2025-12-04T08:57:06.4883721Z  * [new tag]                 ciflow/rocm/168073          -> ciflow/rocm/168073
2025-12-04T08:57:06.4885090Z  * [new tag]                 ciflow/rocm/168195          -> ciflow/rocm/168195
2025-12-04T08:57:06.4886136Z  * [new tag]                 ciflow/rocm/168939          -> ciflow/rocm/168939
2025-12-04T08:57:06.4887242Z  * [new tag]                 ciflow/rocm/168971          -> ciflow/rocm/168971
2025-12-04T08:57:06.4888357Z  * [new tag]                 ciflow/rocm/169024          -> ciflow/rocm/169024
2025-12-04T08:57:06.4889514Z  * [new tag]                 ciflow/rocm/169200          -> ciflow/rocm/169200
2025-12-04T08:57:06.4890600Z  * [new tag]                 ciflow/rocm/169216          -> ciflow/rocm/169216
2025-12-04T08:57:06.4891711Z  * [new tag]                 ciflow/rocm/169312          -> ciflow/rocm/169312
2025-12-04T08:57:06.4892833Z  * [new tag]                 ciflow/rocm/169380          -> ciflow/rocm/169380
2025-12-04T08:57:06.4894051Z  * [new tag]                 ciflow/rocm/169427          -> ciflow/rocm/169427
2025-12-04T08:57:06.4895118Z  * [new tag]                 ciflow/rocm/169455          -> ciflow/rocm/169455
2025-12-04T08:57:06.4896228Z  * [new tag]                 ciflow/rocm/169470          -> ciflow/rocm/169470
2025-12-04T08:57:06.4897353Z  * [new tag]                 ciflow/rocm/169471          -> ciflow/rocm/169471
2025-12-04T08:57:06.4898645Z  * [new tag]                 ciflow/rocm/169472          -> ciflow/rocm/169472
2025-12-04T08:57:06.4899617Z  * [new tag]                 ciflow/rocm/169514          -> ciflow/rocm/169514
2025-12-04T08:57:06.4901097Z  * [new tag]                 ciflow/slow/01c7106         -> ciflow/slow/01c7106
2025-12-04T08:57:06.4902241Z  * [new tag]                 ciflow/slow/0577043         -> ciflow/slow/0577043
2025-12-04T08:57:06.4903920Z  * [new tag]                 ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym
2025-12-04T08:57:06.4904785Z  * [new tag]                 ciflow/slow/0e81104         -> ciflow/slow/0e81104
2025-12-04T08:57:06.4905860Z  * [new tag]                 ciflow/slow/167207          -> ciflow/slow/167207
2025-12-04T08:57:06.4906930Z  * [new tag]                 ciflow/slow/168050          -> ciflow/slow/168050
2025-12-04T08:57:06.4908256Z  * [new tag]                 ciflow/slow/1732077         -> ciflow/slow/1732077
2025-12-04T08:57:06.4909383Z  * [new tag]                 ciflow/slow/187eb7c         -> ciflow/slow/187eb7c
2025-12-04T08:57:06.4911013Z  * [new tag]                 ciflow/slow/1faef89         -> ciflow/slow/1faef89
2025-12-04T08:57:06.4912634Z  * [new tag]                 ciflow/slow/3920ec1         -> ciflow/slow/3920ec1
2025-12-04T08:57:06.4914099Z  * [new tag]                 ciflow/slow/3b7c6b2         -> ciflow/slow/3b7c6b2
2025-12-04T08:57:06.4915427Z  * [new tag]                 ciflow/slow/59a3759         -> ciflow/slow/59a3759
2025-12-04T08:57:06.4916745Z  * [new tag]                 ciflow/slow/70ef0bb         -> ciflow/slow/70ef0bb
2025-12-04T08:57:06.4918238Z  * [new tag]                 ciflow/slow/788ff06         -> ciflow/slow/788ff06
2025-12-04T08:57:06.4919834Z  * [new tag]                 ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym
2025-12-04T08:57:06.4920939Z  * [new tag]                 ciflow/slow/9d85864         -> ciflow/slow/9d85864
2025-12-04T08:57:06.4922326Z  * [new tag]                 ciflow/slow/9ffad5b         -> ciflow/slow/9ffad5b
2025-12-04T08:57:06.4923525Z  * [new tag]                 ciflow/slow/a206e8b         -> ciflow/slow/a206e8b
2025-12-04T08:57:06.4925020Z  * [new tag]                 ciflow/slow/a837609         -> ciflow/slow/a837609
2025-12-04T08:57:06.4926093Z  * [new tag]                 ciflow/slow/af841f3         -> ciflow/slow/af841f3
2025-12-04T08:57:06.4927833Z  * [new tag]                 ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym
2025-12-04T08:57:06.4928819Z  * [new tag]                 ciflow/torchbench/168175    -> ciflow/torchbench/168175
2025-12-04T08:57:06.4930203Z  * [new tag]                 ciflow/trunk/148492         -> ciflow/trunk/148492
2025-12-04T08:57:06.4931189Z  * [new tag]                 ciflow/trunk/157149         -> ciflow/trunk/157149
2025-12-04T08:57:06.4932252Z  * [new tag]                 ciflow/trunk/157994         -> ciflow/trunk/157994
2025-12-04T08:57:06.4933314Z  * [new tag]                 ciflow/trunk/159718         -> ciflow/trunk/159718
2025-12-04T08:57:06.4934414Z  * [new tag]                 ciflow/trunk/160685         -> ciflow/trunk/160685
2025-12-04T08:57:06.4935455Z  * [new tag]                 ciflow/trunk/160729         -> ciflow/trunk/160729
2025-12-04T08:57:06.4936518Z  * [new tag]                 ciflow/trunk/162275         -> ciflow/trunk/162275
2025-12-04T08:57:06.4937599Z  * [new tag]                 ciflow/trunk/162795         -> ciflow/trunk/162795
2025-12-04T08:57:06.4938675Z  * [new tag]                 ciflow/trunk/163245         -> ciflow/trunk/163245
2025-12-04T08:57:06.4939735Z  * [new tag]                 ciflow/trunk/163942         -> ciflow/trunk/163942
2025-12-04T08:57:06.4940748Z  * [new tag]                 ciflow/trunk/165274         -> ciflow/trunk/165274
2025-12-04T08:57:06.4942321Z  * [new tag]                 ciflow/trunk/165483         -> ciflow/trunk/165483
2025-12-04T08:57:06.4943757Z  * [new tag]                 ciflow/trunk/165728         -> ciflow/trunk/165728
2025-12-04T08:57:06.4945103Z  * [new tag]                 ciflow/trunk/165922         -> ciflow/trunk/165922
2025-12-04T08:57:06.4946446Z  * [new tag]                 ciflow/trunk/166075         -> ciflow/trunk/166075
2025-12-04T08:57:06.4947496Z  * [new tag]                 ciflow/trunk/166165         -> ciflow/trunk/166165
2025-12-04T08:57:06.4948609Z  * [new tag]                 ciflow/trunk/166829         -> ciflow/trunk/166829
2025-12-04T08:57:06.4949989Z  * [new tag]                 ciflow/trunk/166843         -> ciflow/trunk/166843
2025-12-04T08:57:06.4951056Z  * [new tag]                 ciflow/trunk/166876         -> ciflow/trunk/166876
2025-12-04T08:57:06.4952172Z  * [new tag]                 ciflow/trunk/167207         -> ciflow/trunk/167207
2025-12-04T08:57:06.4953291Z  * [new tag]                 ciflow/trunk/167536         -> ciflow/trunk/167536
2025-12-04T08:57:06.4954518Z  * [new tag]                 ciflow/trunk/167552         -> ciflow/trunk/167552
2025-12-04T08:57:06.4955598Z  * [new tag]                 ciflow/trunk/167555         -> ciflow/trunk/167555
2025-12-04T08:57:06.4956815Z  * [new tag]                 ciflow/trunk/167599         -> ciflow/trunk/167599
2025-12-04T08:57:06.4957943Z  * [new tag]                 ciflow/trunk/167659         -> ciflow/trunk/167659
2025-12-04T08:57:06.4959377Z  * [new tag]                 ciflow/trunk/167672         -> ciflow/trunk/167672
2025-12-04T08:57:06.4960475Z  * [new tag]                 ciflow/trunk/167742         -> ciflow/trunk/167742
2025-12-04T08:57:06.4961612Z  * [new tag]                 ciflow/trunk/167781         -> ciflow/trunk/167781
2025-12-04T08:57:06.4962978Z  * [new tag]                 ciflow/trunk/167837         -> ciflow/trunk/167837
2025-12-04T08:57:06.4964048Z  * [new tag]                 ciflow/trunk/167887         -> ciflow/trunk/167887
2025-12-04T08:57:06.4965156Z  * [new tag]                 ciflow/trunk/167978         -> ciflow/trunk/167978
2025-12-04T08:57:06.4966381Z  * [new tag]                 ciflow/trunk/168050         -> ciflow/trunk/168050
2025-12-04T08:57:06.4967533Z  * [new tag]                 ciflow/trunk/168051         -> ciflow/trunk/168051
2025-12-04T08:57:06.4968811Z  * [new tag]                 ciflow/trunk/168096         -> ciflow/trunk/168096
2025-12-04T08:57:06.4969785Z  * [new tag]                 ciflow/trunk/168127         -> ciflow/trunk/168127
2025-12-04T08:57:06.4970840Z  * [new tag]                 ciflow/trunk/168157         -> ciflow/trunk/168157
2025-12-04T08:57:06.4972064Z  * [new tag]                 ciflow/trunk/168175         -> ciflow/trunk/168175
2025-12-04T08:57:06.4973152Z  * [new tag]                 ciflow/trunk/168209         -> ciflow/trunk/168209
2025-12-04T08:57:06.4974333Z  * [new tag]                 ciflow/trunk/168213         -> ciflow/trunk/168213
2025-12-04T08:57:06.4975665Z  * [new tag]                 ciflow/trunk/168226         -> ciflow/trunk/168226
2025-12-04T08:57:06.4976768Z  * [new tag]                 ciflow/trunk/168262         -> ciflow/trunk/168262
2025-12-04T08:57:06.4977961Z  * [new tag]                 ciflow/trunk/168275         -> ciflow/trunk/168275
2025-12-04T08:57:06.4979140Z  * [new tag]                 ciflow/trunk/168328         -> ciflow/trunk/168328
2025-12-04T08:57:06.4980298Z  * [new tag]                 ciflow/trunk/168368         -> ciflow/trunk/168368
2025-12-04T08:57:06.4981489Z  * [new tag]                 ciflow/trunk/168917         -> ciflow/trunk/168917
2025-12-04T08:57:06.4982582Z  * [new tag]                 ciflow/trunk/168933         -> ciflow/trunk/168933
2025-12-04T08:57:06.4983981Z  * [new tag]                 ciflow/trunk/168941         -> ciflow/trunk/168941
2025-12-04T08:57:06.4985050Z  * [new tag]                 ciflow/trunk/168955         -> ciflow/trunk/168955
2025-12-04T08:57:06.4986089Z  * [new tag]                 ciflow/trunk/168980         -> ciflow/trunk/168980
2025-12-04T08:57:06.4987432Z  * [new tag]                 ciflow/trunk/169004         -> ciflow/trunk/169004
2025-12-04T08:57:06.4988588Z  * [new tag]                 ciflow/trunk/169006         -> ciflow/trunk/169006
2025-12-04T08:57:06.4989636Z  * [new tag]                 ciflow/trunk/169023         -> ciflow/trunk/169023
2025-12-04T08:57:06.4990706Z  * [new tag]                 ciflow/trunk/169025         -> ciflow/trunk/169025
2025-12-04T08:57:06.4991867Z  * [new tag]                 ciflow/trunk/169048         -> ciflow/trunk/169048
2025-12-04T08:57:06.4993521Z  * [new tag]                 ciflow/trunk/169066         -> ciflow/trunk/169066
2025-12-04T08:57:06.4994694Z  * [new tag]                 ciflow/trunk/169091         -> ciflow/trunk/169091
2025-12-04T08:57:06.4995786Z  * [new tag]                 ciflow/trunk/169102         -> ciflow/trunk/169102
2025-12-04T08:57:06.4996957Z  * [new tag]                 ciflow/trunk/169103         -> ciflow/trunk/169103
2025-12-04T08:57:06.4998239Z  * [new tag]                 ciflow/trunk/169125         -> ciflow/trunk/169125
2025-12-04T08:57:06.4999497Z  * [new tag]                 ciflow/trunk/169139         -> ciflow/trunk/169139
2025-12-04T08:57:06.5000985Z  * [new tag]                 ciflow/trunk/169148         -> ciflow/trunk/169148
2025-12-04T08:57:06.5002120Z  * [new tag]                 ciflow/trunk/169151         -> ciflow/trunk/169151
2025-12-04T08:57:06.5003204Z  * [new tag]                 ciflow/trunk/169156         -> ciflow/trunk/169156
2025-12-04T08:57:06.5004648Z  * [new tag]                 ciflow/trunk/169176         -> ciflow/trunk/169176
2025-12-04T08:57:06.5005744Z  * [new tag]                 ciflow/trunk/169204         -> ciflow/trunk/169204
2025-12-04T08:57:06.5006794Z  * [new tag]                 ciflow/trunk/169207         -> ciflow/trunk/169207
2025-12-04T08:57:06.5007971Z  * [new tag]                 ciflow/trunk/169211         -> ciflow/trunk/169211
2025-12-04T08:57:06.5009491Z  * [new tag]                 ciflow/trunk/169229         -> ciflow/trunk/169229
2025-12-04T08:57:06.5010607Z  * [new tag]                 ciflow/trunk/169231         -> ciflow/trunk/169231
2025-12-04T08:57:06.5011715Z  * [new tag]                 ciflow/trunk/169260         -> ciflow/trunk/169260
2025-12-04T08:57:06.5013247Z  * [new tag]                 ciflow/trunk/169271         -> ciflow/trunk/169271
2025-12-04T08:57:06.5014449Z  * [new tag]                 ciflow/trunk/169280         -> ciflow/trunk/169280
2025-12-04T08:57:06.5015471Z  * [new tag]                 ciflow/trunk/169281         -> ciflow/trunk/169281
2025-12-04T08:57:06.5016582Z  * [new tag]                 ciflow/trunk/169286         -> ciflow/trunk/169286
2025-12-04T08:57:06.5017985Z  * [new tag]                 ciflow/trunk/169293         -> ciflow/trunk/169293
2025-12-04T08:57:06.5019135Z  * [new tag]                 ciflow/trunk/169296         -> ciflow/trunk/169296
2025-12-04T08:57:06.5020374Z  * [new tag]                 ciflow/trunk/169304         -> ciflow/trunk/169304
2025-12-04T08:57:06.5021443Z  * [new tag]                 ciflow/trunk/169305         -> ciflow/trunk/169305
2025-12-04T08:57:06.5022633Z  * [new tag]                 ciflow/trunk/169312         -> ciflow/trunk/169312
2025-12-04T08:57:06.5024172Z  * [new tag]                 ciflow/trunk/169328         -> ciflow/trunk/169328
2025-12-04T08:57:06.5025155Z  * [new tag]                 ciflow/trunk/169343         -> ciflow/trunk/169343
2025-12-04T08:57:06.5026374Z  * [new tag]                 ciflow/trunk/169355         -> ciflow/trunk/169355
2025-12-04T08:57:06.5027377Z  * [new tag]                 ciflow/trunk/169370         -> ciflow/trunk/169370
2025-12-04T08:57:06.5028740Z  * [new tag]                 ciflow/trunk/169379         -> ciflow/trunk/169379
2025-12-04T08:57:06.5029841Z  * [new tag]                 ciflow/trunk/169380         -> ciflow/trunk/169380
2025-12-04T08:57:06.5030939Z  * [new tag]                 ciflow/trunk/169385         -> ciflow/trunk/169385
2025-12-04T08:57:06.5032041Z  * [new tag]                 ciflow/trunk/169387         -> ciflow/trunk/169387
2025-12-04T08:57:06.5033360Z  * [new tag]                 ciflow/trunk/169410         -> ciflow/trunk/169410
2025-12-04T08:57:06.5034516Z  * [new tag]                 ciflow/trunk/169412         -> ciflow/trunk/169412
2025-12-04T08:57:06.5035669Z  * [new tag]                 ciflow/trunk/169418         -> ciflow/trunk/169418
2025-12-04T08:57:06.5036969Z  * [new tag]                 ciflow/trunk/169423         -> ciflow/trunk/169423
2025-12-04T08:57:06.5037975Z  * [new tag]                 ciflow/trunk/169427         -> ciflow/trunk/169427
2025-12-04T08:57:06.5039072Z  * [new tag]                 ciflow/trunk/169430         -> ciflow/trunk/169430
2025-12-04T08:57:06.5040325Z  * [new tag]                 ciflow/trunk/169437         -> ciflow/trunk/169437
2025-12-04T08:57:06.5041468Z  * [new tag]                 ciflow/trunk/169442         -> ciflow/trunk/169442
2025-12-04T08:57:06.5042598Z  * [new tag]                 ciflow/trunk/169452         -> ciflow/trunk/169452
2025-12-04T08:57:06.5043738Z  * [new tag]                 ciflow/trunk/169454         -> ciflow/trunk/169454
2025-12-04T08:57:06.5044843Z  * [new tag]                 ciflow/trunk/169459         -> ciflow/trunk/169459
2025-12-04T08:57:06.5046183Z  * [new tag]                 ciflow/trunk/169474         -> ciflow/trunk/169474
2025-12-04T08:57:06.5047266Z  * [new tag]                 ciflow/trunk/169475         -> ciflow/trunk/169475
2025-12-04T08:57:06.5048401Z  * [new tag]                 ciflow/trunk/169476         -> ciflow/trunk/169476
2025-12-04T08:57:06.5049696Z  * [new tag]                 ciflow/trunk/169487         -> ciflow/trunk/169487
2025-12-04T08:57:06.5050899Z  * [new tag]                 ciflow/trunk/169497         -> ciflow/trunk/169497
2025-12-04T08:57:06.5052079Z  * [new tag]                 ciflow/trunk/169503         -> ciflow/trunk/169503
2025-12-04T08:57:06.5053259Z  * [new tag]                 ciflow/trunk/169505         -> ciflow/trunk/169505
2025-12-04T08:57:06.5054350Z  * [new tag]                 ciflow/trunk/169507         -> ciflow/trunk/169507
2025-12-04T08:57:06.5055484Z  * [new tag]                 ciflow/trunk/169514         -> ciflow/trunk/169514
2025-12-04T08:57:06.5056736Z  * [new tag]                 ciflow/trunk/169517         -> ciflow/trunk/169517
2025-12-04T08:57:06.5057930Z  * [new tag]                 ciflow/trunk/169519         -> ciflow/trunk/169519
2025-12-04T08:57:06.5058977Z  * [new tag]                 ciflow/trunk/169528         -> ciflow/trunk/169528
2025-12-04T08:57:06.5060067Z  * [new tag]                 ciflow/trunk/169541         -> ciflow/trunk/169541
2025-12-04T08:57:06.5061310Z  * [new tag]                 ciflow/trunk/169555         -> ciflow/trunk/169555
2025-12-04T08:57:06.5063029Z  * [new tag]                 ciflow/unstable/123         -> ciflow/unstable/123
2025-12-04T08:57:06.5064273Z  * [new tag]                 ciflow/vllm/165270          -> ciflow/vllm/165270
2025-12-04T08:57:06.5065365Z  * [new tag]                 ciflow/vllm/165274          -> ciflow/vllm/165274
2025-12-04T08:57:06.5066524Z  * [new tag]                 ciflow/vllm/166494          -> ciflow/vllm/166494
2025-12-04T08:57:06.5067605Z  * [new tag]                 ciflow/vllm/169219          -> ciflow/vllm/169219
2025-12-04T08:57:06.5068610Z  * [new tag]                 ciflow/vllm/169220          -> ciflow/vllm/169220
2025-12-04T08:57:06.5070598Z  * [new tag]                 ciflow/xpu/157994           -> ciflow/xpu/157994
2025-12-04T08:57:06.5071584Z  * [new tag]                 ciflow/xpu/159718           -> ciflow/xpu/159718
2025-12-04T08:57:06.5072642Z  * [new tag]                 ciflow/xpu/161940           -> ciflow/xpu/161940
2025-12-04T08:57:06.5073901Z  * [new tag]                 ciflow/xpu/163251           -> ciflow/xpu/163251
2025-12-04T08:57:06.5074894Z  * [new tag]                 ciflow/xpu/166829           -> ciflow/xpu/166829
2025-12-04T08:57:06.5075930Z  * [new tag]                 ciflow/xpu/166843           -> ciflow/xpu/166843
2025-12-04T08:57:06.5077236Z  * [new tag]                 ciflow/xpu/167972           -> ciflow/xpu/167972
2025-12-04T08:57:06.5078168Z  * [new tag]                 ciflow/xpu/167981           -> ciflow/xpu/167981
2025-12-04T08:57:06.5079191Z  * [new tag]                 ciflow/xpu/168213           -> ciflow/xpu/168213
2025-12-04T08:57:06.5080353Z  * [new tag]                 ciflow/xpu/168262           -> ciflow/xpu/168262
2025-12-04T08:57:06.5081386Z  * [new tag]                 ciflow/xpu/168328           -> ciflow/xpu/168328
2025-12-04T08:57:06.5082857Z  * [new tag]                 ciflow/xpu/168950           -> ciflow/xpu/168950
2025-12-04T08:57:06.5084472Z  * [new tag]                 ciflow/xpu/169039           -> ciflow/xpu/169039
2025-12-04T08:57:06.5085628Z  * [new tag]                 ciflow/xpu/169200           -> ciflow/xpu/169200
2025-12-04T08:57:06.5086786Z  * [new tag]                 ciflow/xpu/169203           -> ciflow/xpu/169203
2025-12-04T08:57:06.5087989Z  * [new tag]                 ciflow/xpu/169229           -> ciflow/xpu/169229
2025-12-04T08:57:06.5089118Z  * [new tag]                 ciflow/xpu/169230           -> ciflow/xpu/169230
2025-12-04T08:57:06.5090178Z  * [new tag]                 ciflow/xpu/169231           -> ciflow/xpu/169231
2025-12-04T08:57:06.5091387Z  * [new tag]                 ciflow/xpu/169241           -> ciflow/xpu/169241
2025-12-04T08:57:06.5092540Z  * [new tag]                 ciflow/xpu/169280           -> ciflow/xpu/169280
2025-12-04T08:57:06.5093778Z  * [new tag]                 ciflow/xpu/169296           -> ciflow/xpu/169296
2025-12-04T08:57:06.5094926Z  * [new tag]                 ciflow/xpu/169353           -> ciflow/xpu/169353
2025-12-04T08:57:06.5096062Z  * [new tag]                 ciflow/xpu/169410           -> ciflow/xpu/169410
2025-12-04T08:57:06.5097247Z  * [new tag]                 ciflow/xpu/169442           -> ciflow/xpu/169442
2025-12-04T08:57:06.5098484Z  * [new tag]                 ciflow/xpu/169555           -> ciflow/xpu/169555
2025-12-04T08:57:06.5099553Z  * [new tag]                 cslpull75                   -> cslpull75
2025-12-04T08:57:06.5100673Z  * [new tag]                 cslpull76                   -> cslpull76
2025-12-04T08:57:06.5101743Z  * [new tag]                 cslpull77                   -> cslpull77
2025-12-04T08:57:06.5102918Z  * [new tag]                 cslpull78                   -> cslpull78
2025-12-04T08:57:06.5104566Z  * [new tag]                 cslpull79                   -> cslpull79
2025-12-04T08:57:06.5105763Z  * [new tag]                 cslpull80                   -> cslpull80
2025-12-04T08:57:06.5107124Z  * [new tag]                 cslpull81                   -> cslpull81
2025-12-04T08:57:06.5108427Z  * [new tag]                 cslpull82                   -> cslpull82
2025-12-04T08:57:06.5109526Z  * [new tag]                 cslpull83                   -> cslpull83
2025-12-04T08:57:06.5110673Z  * [new tag]                 cslpull84                   -> cslpull84
2025-12-04T08:57:06.5111890Z  * [new tag]                 cslpull85                   -> cslpull85
2025-12-04T08:57:06.5113273Z  * [new tag]                 cslpull86                   -> cslpull86
2025-12-04T08:57:06.5114423Z  * [new tag]                 cslpull87                   -> cslpull87
2025-12-04T08:57:06.5115518Z  * [new tag]                 cslpull88                   -> cslpull88
2025-12-04T08:57:06.5116754Z  * [new tag]                 cslpull89                   -> cslpull89
2025-12-04T08:57:06.5118062Z  * [new tag]                 cslpull90                   -> cslpull90
2025-12-04T08:57:06.5119620Z  * [new tag]                 cslpull91                   -> cslpull91
2025-12-04T08:57:06.5120768Z  * [new tag]                 cslpull92                   -> cslpull92
2025-12-04T08:57:06.5122169Z  * [new tag]                 flight_5                    -> flight_5
2025-12-04T08:57:06.5123461Z  * [new tag]                 flight_5.1                  -> flight_5.1
2025-12-04T08:57:06.5124626Z  * [new tag]                 flight_5.2                  -> flight_5.2
2025-12-04T08:57:06.5126214Z  * [new tag]                 flight_5.3                  -> flight_5.3
2025-12-04T08:57:06.5127250Z  * [new tag]                 forpull1                    -> forpull1
2025-12-04T08:57:06.5128828Z  * [new tag]                 malfet/tag-2ef5611          -> malfet/tag-2ef5611
2025-12-04T08:57:06.5129984Z  * [new tag]                 malfet/tag-317b1a0          -> malfet/tag-317b1a0
2025-12-04T08:57:06.5131116Z  * [new tag]                 malfet/tag-ec6f767          -> malfet/tag-ec6f767
2025-12-04T08:57:06.5132379Z  * [new tag]                 nightly-binary              -> nightly-binary
2025-12-04T08:57:06.5133664Z  * [new tag]                 sqzhang_flight4_plus        -> sqzhang_flight4_plus
2025-12-04T08:57:06.5134957Z  * [new tag]                 sqzhang_flight_3            -> sqzhang_flight_3
2025-12-04T08:57:06.5136641Z  * [new tag]                 trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272
2025-12-04T08:57:06.5137794Z  * [new tag]                 trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e
2025-12-04T08:57:06.5139242Z  * [new tag]                 trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88
2025-12-04T08:57:06.5141088Z  * [new tag]                 trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654
2025-12-04T08:57:06.5142196Z  * [new tag]                 trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb
2025-12-04T08:57:06.5143460Z  * [new tag]                 trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b
2025-12-04T08:57:06.5144832Z  * [new tag]                 trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672
2025-12-04T08:57:06.5146017Z  * [new tag]                 trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75
2025-12-04T08:57:06.5147460Z  * [new tag]                 trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2
2025-12-04T08:57:06.5148754Z  * [new tag]                 trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5
2025-12-04T08:57:06.5150182Z  * [new tag]                 trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688
2025-12-04T08:57:06.5151333Z  * [new tag]                 trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10
2025-12-04T08:57:06.5152593Z  * [new tag]                 trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec
2025-12-04T08:57:06.5153800Z  * [new tag]                 trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f
2025-12-04T08:57:06.5156043Z  * [new tag]                 trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11
2025-12-04T08:57:06.5157234Z  * [new tag]                 trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63
2025-12-04T08:57:06.5158000Z  * [new tag]                 trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5
2025-12-04T08:57:06.5159247Z  * [new tag]                 trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676
2025-12-04T08:57:06.5160518Z  * [new tag]                 trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e
2025-12-04T08:57:06.5161783Z  * [new tag]                 trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520
2025-12-04T08:57:06.5163083Z  * [new tag]                 trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8
2025-12-04T08:57:06.5164418Z  * [new tag]                 trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c
2025-12-04T08:57:06.5165544Z  * [new tag]                 trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d
2025-12-04T08:57:06.5166821Z  * [new tag]                 trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8
2025-12-04T08:57:06.5168120Z  * [new tag]                 trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de
2025-12-04T08:57:06.5169389Z  * [new tag]                 trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543
2025-12-04T08:57:06.5170682Z  * [new tag]                 trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7
2025-12-04T08:57:06.5171912Z  * [new tag]                 trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f
2025-12-04T08:57:06.5173274Z  * [new tag]                 trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9
2025-12-04T08:57:06.5174449Z  * [new tag]                 trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87
2025-12-04T08:57:06.5175709Z  * [new tag]                 trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc
2025-12-04T08:57:06.5176939Z  * [new tag]                 trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25
2025-12-04T08:57:06.5178246Z  * [new tag]                 trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563
2025-12-04T08:57:06.5179529Z  * [new tag]                 trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf
2025-12-04T08:57:06.5180669Z  * [new tag]                 trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd
2025-12-04T08:57:06.5181914Z  * [new tag]                 trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1
2025-12-04T08:57:06.5183282Z  * [new tag]                 trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708
2025-12-04T08:57:06.5184530Z  * [new tag]                 trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35
2025-12-04T08:57:06.5185811Z  * [new tag]                 trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec
2025-12-04T08:57:06.5187350Z  * [new tag]                 trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94
2025-12-04T08:57:06.5188543Z  * [new tag]                 trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1
2025-12-04T08:57:06.5189802Z  * [new tag]                 trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48
2025-12-04T08:57:06.5191605Z  * [new tag]                 trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991
2025-12-04T08:57:06.5192843Z  * [new tag]                 trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8
2025-12-04T08:57:06.5194235Z  * [new tag]                 trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf
2025-12-04T08:57:06.5195535Z  * [new tag]                 trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee
2025-12-04T08:57:06.5196828Z  * [new tag]                 trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938
2025-12-04T08:57:06.5197978Z  * [new tag]                 trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725
2025-12-04T08:57:06.5199244Z  * [new tag]                 trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae
2025-12-04T08:57:06.5200605Z  * [new tag]                 trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f
2025-12-04T08:57:06.5201845Z  * [new tag]                 trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c
2025-12-04T08:57:06.5203180Z  * [new tag]                 trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247
2025-12-04T08:57:06.5204393Z  * [new tag]                 trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c
2025-12-04T08:57:06.5205636Z  * [new tag]                 trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a
2025-12-04T08:57:06.5206922Z  * [new tag]                 trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686
2025-12-04T08:57:06.5208553Z  * [new tag]                 trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54
2025-12-04T08:57:06.5209589Z  * [new tag]                 trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af
2025-12-04T08:57:06.5210838Z  * [new tag]                 trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96
2025-12-04T08:57:06.5212172Z  * [new tag]                 trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873
2025-12-04T08:57:06.5213422Z  * [new tag]                 trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985
2025-12-04T08:57:06.5214564Z  * [new tag]                 trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f
2025-12-04T08:57:06.5215965Z  * [new tag]                 trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa
2025-12-04T08:57:06.5217355Z  * [new tag]                 trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c
2025-12-04T08:57:06.5221042Z  * [new tag]                 trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a
2025-12-04T08:57:06.5222096Z  * [new tag]                 trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d
2025-12-04T08:57:06.5223301Z  * [new tag]                 trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9
2025-12-04T08:57:06.5224565Z  * [new tag]                 trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3
2025-12-04T08:57:06.5225894Z  * [new tag]                 trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a
2025-12-04T08:57:06.5227217Z  * [new tag]                 trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292
2025-12-04T08:57:06.5228495Z  * [new tag]                 trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28
2025-12-04T08:57:06.5229870Z  * [new tag]                 trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66
2025-12-04T08:57:06.5231184Z  * [new tag]                 trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96
2025-12-04T08:57:06.5232335Z  * [new tag]                 trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc
2025-12-04T08:57:06.5233617Z  * [new tag]                 trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068
2025-12-04T08:57:06.5234887Z  * [new tag]                 trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd
2025-12-04T08:57:06.5236018Z  * [new tag]                 trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6
2025-12-04T08:57:06.5237396Z  * [new tag]                 trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883
2025-12-04T08:57:06.5238668Z  * [new tag]                 trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c
2025-12-04T08:57:06.5240006Z  * [new tag]                 trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b
2025-12-04T08:57:06.5241200Z  * [new tag]                 trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c
2025-12-04T08:57:06.5242406Z  * [new tag]                 trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202
2025-12-04T08:57:06.5243761Z  * [new tag]                 trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447
2025-12-04T08:57:06.5245068Z  * [new tag]                 trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef
2025-12-04T08:57:06.5246328Z  * [new tag]                 trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719
2025-12-04T08:57:06.5247629Z  * [new tag]                 trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b
2025-12-04T08:57:06.5248914Z  * [new tag]                 trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a
2025-12-04T08:57:06.5250221Z  * [new tag]                 trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206
2025-12-04T08:57:06.5251484Z  * [new tag]                 trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386
2025-12-04T08:57:06.5252742Z  * [new tag]                 trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4
2025-12-04T08:57:06.5254165Z  * [new tag]                 trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d
2025-12-04T08:57:06.5255322Z  * [new tag]                 trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b
2025-12-04T08:57:06.5256531Z  * [new tag]                 trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5
2025-12-04T08:57:06.5257818Z  * [new tag]                 trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8
2025-12-04T08:57:06.5259150Z  * [new tag]                 trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec
2025-12-04T08:57:06.5260363Z  * [new tag]                 trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71
2025-12-04T08:57:06.5261635Z  * [new tag]                 trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d
2025-12-04T08:57:06.5262928Z  * [new tag]                 trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a
2025-12-04T08:57:06.5264226Z  * [new tag]                 trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e
2025-12-04T08:57:06.5265462Z  * [new tag]                 trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8
2025-12-04T08:57:06.5266704Z  * [new tag]                 trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485
2025-12-04T08:57:06.5268010Z  * [new tag]                 trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734
2025-12-04T08:57:06.5269344Z  * [new tag]                 trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f
2025-12-04T08:57:06.5270651Z  * [new tag]                 trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696
2025-12-04T08:57:06.5271998Z  * [new tag]                 trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03
2025-12-04T08:57:06.5273656Z  * [new tag]                 trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6
2025-12-04T08:57:06.5274842Z  * [new tag]                 trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7
2025-12-04T08:57:06.5276343Z  * [new tag]                 trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3
2025-12-04T08:57:06.5277615Z  * [new tag]                 trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca
2025-12-04T08:57:06.5278881Z  * [new tag]                 trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b
2025-12-04T08:57:06.5280219Z  * [new tag]                 trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609
2025-12-04T08:57:06.5281608Z  * [new tag]                 trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b
2025-12-04T08:57:06.5282715Z  * [new tag]                 trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T08:57:06.5283957Z  * [new tag]                 trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8
2025-12-04T08:57:06.5285181Z  * [new tag]                 trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed
2025-12-04T08:57:06.5286471Z  * [new tag]                 trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8
2025-12-04T08:57:06.5287854Z  * [new tag]                 trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e
2025-12-04T08:57:06.5288923Z  * [new tag]                 trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead
2025-12-04T08:57:06.5290212Z  * [new tag]                 trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718
2025-12-04T08:57:06.5291670Z  * [new tag]                 trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0
2025-12-04T08:57:06.5293274Z  * [new tag]                 trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75
2025-12-04T08:57:06.5294402Z  * [new tag]                 trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece
2025-12-04T08:57:06.5295537Z  * [new tag]                 trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6
2025-12-04T08:57:06.5296924Z  * [new tag]                 trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4
2025-12-04T08:57:06.5298165Z  * [new tag]                 trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c
2025-12-04T08:57:06.5299425Z  * [new tag]                 trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43
2025-12-04T08:57:06.5300791Z  * [new tag]                 trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922
2025-12-04T08:57:06.5302122Z  * [new tag]                 trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca
2025-12-04T08:57:06.5303461Z  * [new tag]                 trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38
2025-12-04T08:57:06.5304591Z  * [new tag]                 trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90
2025-12-04T08:57:06.5305816Z  * [new tag]                 trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c
2025-12-04T08:57:06.5307106Z  * [new tag]                 trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87
2025-12-04T08:57:06.5308312Z  * [new tag]                 trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693
2025-12-04T08:57:06.5309655Z  * [new tag]                 trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa
2025-12-04T08:57:06.5310960Z  * [new tag]                 trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d
2025-12-04T08:57:06.5312308Z  * [new tag]                 trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639
2025-12-04T08:57:06.5313671Z  * [new tag]                 trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8
2025-12-04T08:57:06.5314962Z  * [new tag]                 trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d
2025-12-04T08:57:06.5316253Z  * [new tag]                 trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a
2025-12-04T08:57:06.5317767Z  * [new tag]                 trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742
2025-12-04T08:57:06.5319162Z  * [new tag]                 trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098
2025-12-04T08:57:06.5320460Z  * [new tag]                 trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa
2025-12-04T08:57:06.5321775Z  * [new tag]                 trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d
2025-12-04T08:57:06.5323509Z  * [new tag]                 trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c
2025-12-04T08:57:06.5324600Z  * [new tag]                 trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90
2025-12-04T08:57:06.5325825Z  * [new tag]                 trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c
2025-12-04T08:57:06.5326987Z  * [new tag]                 trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45
2025-12-04T08:57:06.5328445Z  * [new tag]                 trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109
2025-12-04T08:57:06.5329689Z  * [new tag]                 trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e
2025-12-04T08:57:06.5330972Z  * [new tag]                 trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e
2025-12-04T08:57:06.5332132Z  * [new tag]                 trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e
2025-12-04T08:57:06.5333344Z  * [new tag]                 trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48
2025-12-04T08:57:06.5334631Z  * [new tag]                 trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62
2025-12-04T08:57:06.5335963Z  * [new tag]                 trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2
2025-12-04T08:57:06.5337317Z  * [new tag]                 trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99
2025-12-04T08:57:06.5338659Z  * [new tag]                 trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24
2025-12-04T08:57:06.5340058Z  * [new tag]                 trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7
2025-12-04T08:57:06.5341399Z  * [new tag]                 trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a
2025-12-04T08:57:06.5342713Z  * [new tag]                 trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417
2025-12-04T08:57:06.5344033Z  * [new tag]                 trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4
2025-12-04T08:57:06.5345385Z  * [new tag]                 trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b
2025-12-04T08:57:06.5346711Z  * [new tag]                 trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e
2025-12-04T08:57:06.5348101Z  * [new tag]                 trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf
2025-12-04T08:57:06.5349399Z  * [new tag]                 trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5
2025-12-04T08:57:06.5350806Z  * [new tag]                 trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f
2025-12-04T08:57:06.5352131Z  * [new tag]                 trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f
2025-12-04T08:57:06.5353489Z  * [new tag]                 trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2
2025-12-04T08:57:06.5354769Z  * [new tag]                 trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55
2025-12-04T08:57:06.5356317Z  * [new tag]                 trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8
2025-12-04T08:57:06.5357698Z  * [new tag]                 trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09
2025-12-04T08:57:06.5358972Z  * [new tag]                 trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5
2025-12-04T08:57:06.5361069Z  * [new tag]                 trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564
2025-12-04T08:57:06.5362295Z  * [new tag]                 trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede
2025-12-04T08:57:06.5363625Z  * [new tag]                 trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac
2025-12-04T08:57:06.5364949Z  * [new tag]                 trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb
2025-12-04T08:57:06.5366182Z  * [new tag]                 trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1
2025-12-04T08:57:06.5367790Z  * [new tag]                 trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0
2025-12-04T08:57:06.5369025Z  * [new tag]                 trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09
2025-12-04T08:57:06.5370254Z  * [new tag]                 trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9
2025-12-04T08:57:06.5371615Z  * [new tag]                 trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a
2025-12-04T08:57:06.5372965Z  * [new tag]                 trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace
2025-12-04T08:57:06.5374260Z  * [new tag]                 trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39
2025-12-04T08:57:06.5375537Z  * [new tag]                 trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50
2025-12-04T08:57:06.5376852Z  * [new tag]                 trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01
2025-12-04T08:57:06.5378250Z  * [new tag]                 trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf
2025-12-04T08:57:06.5379601Z  * [new tag]                 trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8
2025-12-04T08:57:06.5380828Z  * [new tag]                 trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d
2025-12-04T08:57:06.5382166Z  * [new tag]                 trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47
2025-12-04T08:57:06.5383454Z  * [new tag]                 trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1
2025-12-04T08:57:06.5384775Z  * [new tag]                 trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e
2025-12-04T08:57:06.5386210Z  * [new tag]                 trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a
2025-12-04T08:57:06.5387515Z  * [new tag]                 trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b
2025-12-04T08:57:06.5388824Z  * [new tag]                 trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec
2025-12-04T08:57:06.5390142Z  * [new tag]                 trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf
2025-12-04T08:57:06.5391462Z  * [new tag]                 trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd
2025-12-04T08:57:06.5392809Z  * [new tag]                 trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889
2025-12-04T08:57:06.5394243Z  * [new tag]                 trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77
2025-12-04T08:57:06.5395731Z  * [new tag]                 trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c
2025-12-04T08:57:06.5397081Z  * [new tag]                 trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b
2025-12-04T08:57:06.5398257Z  * [new tag]                 trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235
2025-12-04T08:57:06.5399708Z  * [new tag]                 trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e
2025-12-04T08:57:06.5401223Z  * [new tag]                 trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc
2025-12-04T08:57:06.5402488Z  * [new tag]                 trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3
2025-12-04T08:57:06.5403772Z  * [new tag]                 trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf
2025-12-04T08:57:06.5405102Z  * [new tag]                 trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e
2025-12-04T08:57:06.5406393Z  * [new tag]                 trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e
2025-12-04T08:57:06.5408150Z  * [new tag]                 trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2
2025-12-04T08:57:06.5409447Z  * [new tag]                 trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4
2025-12-04T08:57:06.5410774Z  * [new tag]                 trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53
2025-12-04T08:57:06.5412067Z  * [new tag]                 trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598
2025-12-04T08:57:06.5413394Z  * [new tag]                 trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40
2025-12-04T08:57:06.5414706Z  * [new tag]                 trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56
2025-12-04T08:57:06.5416024Z  * [new tag]                 trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8
2025-12-04T08:57:06.5417433Z  * [new tag]                 trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467
2025-12-04T08:57:06.5418810Z  * [new tag]                 trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17
2025-12-04T08:57:06.5420176Z  * [new tag]                 trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73
2025-12-04T08:57:06.5421374Z  * [new tag]                 trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7
2025-12-04T08:57:06.5422705Z  * [new tag]                 trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b
2025-12-04T08:57:06.5424092Z  * [new tag]                 trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7
2025-12-04T08:57:06.5425817Z  * [new tag]                 trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307
2025-12-04T08:57:06.5427111Z  * [new tag]                 trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009
2025-12-04T08:57:06.5428433Z  * [new tag]                 trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:57:06.5429705Z  * [new tag]                 v0.1.1                      -> v0.1.1
2025-12-04T08:57:06.5430862Z  * [new tag]                 v0.1.10                     -> v0.1.10
2025-12-04T08:57:06.5431914Z  * [new tag]                 v0.1.11                     -> v0.1.11
2025-12-04T08:57:06.5433126Z  * [new tag]                 v0.1.12                     -> v0.1.12
2025-12-04T08:57:06.5434337Z  * [new tag]                 v0.1.2                      -> v0.1.2
2025-12-04T08:57:06.5435531Z  * [new tag]                 v0.1.3                      -> v0.1.3
2025-12-04T08:57:06.5436705Z  * [new tag]                 v0.1.4                      -> v0.1.4
2025-12-04T08:57:06.5437926Z  * [new tag]                 v0.1.5                      -> v0.1.5
2025-12-04T08:57:06.5439144Z  * [new tag]                 v0.1.6                      -> v0.1.6
2025-12-04T08:57:06.5440492Z  * [new tag]                 v0.1.7                      -> v0.1.7
2025-12-04T08:57:06.5441719Z  * [new tag]                 v0.1.8                      -> v0.1.8
2025-12-04T08:57:06.5442889Z  * [new tag]                 v0.1.9                      -> v0.1.9
2025-12-04T08:57:06.5444136Z  * [new tag]                 v0.2.0                      -> v0.2.0
2025-12-04T08:57:06.5445428Z  * [new tag]                 v0.3.0                      -> v0.3.0
2025-12-04T08:57:06.5446673Z  * [new tag]                 v0.3.1                      -> v0.3.1
2025-12-04T08:57:06.5447879Z  * [new tag]                 v0.4.0                      -> v0.4.0
2025-12-04T08:57:06.5449109Z  * [new tag]                 v0.4.1                      -> v0.4.1
2025-12-04T08:57:06.5450346Z  * [new tag]                 v1.0.0                      -> v1.0.0
2025-12-04T08:57:06.5451590Z  * [new tag]                 v1.0.0a0                    -> v1.0.0a0
2025-12-04T08:57:06.5452835Z  * [new tag]                 v1.0.1                      -> v1.0.1
2025-12-04T08:57:06.5454074Z  * [new tag]                 v1.0rc0                     -> v1.0rc0
2025-12-04T08:57:06.5455232Z  * [new tag]                 v1.0rc1                     -> v1.0rc1
2025-12-04T08:57:06.5456397Z  * [new tag]                 v1.1.0                      -> v1.1.0
2025-12-04T08:57:06.5457624Z  * [new tag]                 v1.1.0a0                    -> v1.1.0a0
2025-12-04T08:57:06.5459037Z  * [new tag]                 v1.10.0                     -> v1.10.0
2025-12-04T08:57:06.5460453Z  * [new tag]                 v1.10.0-rc1                 -> v1.10.0-rc1
2025-12-04T08:57:06.5461690Z  * [new tag]                 v1.10.0-rc2                 -> v1.10.0-rc2
2025-12-04T08:57:06.5462785Z  * [new tag]                 v1.10.0-rc3                 -> v1.10.0-rc3
2025-12-04T08:57:06.5464397Z  * [new tag]                 v1.10.1                     -> v1.10.1
2025-12-04T08:57:06.5465549Z  * [new tag]                 v1.10.1-rc1                 -> v1.10.1-rc1
2025-12-04T08:57:06.5466567Z  * [new tag]                 v1.10.2                     -> v1.10.2
2025-12-04T08:57:06.5467520Z  * [new tag]                 v1.10.2-rc1                 -> v1.10.2-rc1
2025-12-04T08:57:06.5468812Z  * [new tag]                 v1.11.0                     -> v1.11.0
2025-12-04T08:57:06.5470175Z  * [new tag]                 v1.11.0-rc1                 -> v1.11.0-rc1
2025-12-04T08:57:06.5471448Z  * [new tag]                 v1.11.0-rc2                 -> v1.11.0-rc2
2025-12-04T08:57:06.5472839Z  * [new tag]                 v1.11.0-rc3                 -> v1.11.0-rc3
2025-12-04T08:57:06.5474143Z  * [new tag]                 v1.11.0-rc4                 -> v1.11.0-rc4
2025-12-04T08:57:06.5475411Z  * [new tag]                 v1.11.0-rc5                 -> v1.11.0-rc5
2025-12-04T08:57:06.5476527Z  * [new tag]                 v1.11.0-rc6                 -> v1.11.0-rc6
2025-12-04T08:57:06.5477566Z  * [new tag]                 v1.11.0-rc7                 -> v1.11.0-rc7
2025-12-04T08:57:06.5478803Z  * [new tag]                 v1.12.0                     -> v1.12.0
2025-12-04T08:57:06.5480085Z  * [new tag]                 v1.12.0-rc1                 -> v1.12.0-rc1
2025-12-04T08:57:06.5481522Z  * [new tag]                 v1.12.0-rc2                 -> v1.12.0-rc2
2025-12-04T08:57:06.5482757Z  * [new tag]                 v1.12.0-rc3                 -> v1.12.0-rc3
2025-12-04T08:57:06.5484012Z  * [new tag]                 v1.12.0-rc4                 -> v1.12.0-rc4
2025-12-04T08:57:06.5485241Z  * [new tag]                 v1.12.0-rc5                 -> v1.12.0-rc5
2025-12-04T08:57:06.5486511Z  * [new tag]                 v1.12.0-rc6                 -> v1.12.0-rc6
2025-12-04T08:57:06.5487683Z  * [new tag]                 v1.12.0-rc7                 -> v1.12.0-rc7
2025-12-04T08:57:06.5488783Z  * [new tag]                 v1.12.0-rc8                 -> v1.12.0-rc8
2025-12-04T08:57:06.5489816Z  * [new tag]                 v1.12.1                     -> v1.12.1
2025-12-04T08:57:06.5491111Z  * [new tag]                 v1.12.1-rc1                 -> v1.12.1-rc1
2025-12-04T08:57:06.5492363Z  * [new tag]                 v1.12.1-rc2                 -> v1.12.1-rc2
2025-12-04T08:57:06.5493683Z  * [new tag]                 v1.12.1-rc3                 -> v1.12.1-rc3
2025-12-04T08:57:06.5494965Z  * [new tag]                 v1.12.1-rc4                 -> v1.12.1-rc4
2025-12-04T08:57:06.5496072Z  * [new tag]                 v1.12.1-rc5                 -> v1.12.1-rc5
2025-12-04T08:57:06.5497305Z  * [new tag]                 v1.13.0                     -> v1.13.0
2025-12-04T08:57:06.5498509Z  * [new tag]                 v1.13.0-rc1                 -> v1.13.0-rc1
2025-12-04T08:57:06.5499718Z  * [new tag]                 v1.13.0-rc2                 -> v1.13.0-rc2
2025-12-04T08:57:06.5500897Z  * [new tag]                 v1.13.0-rc3                 -> v1.13.0-rc3
2025-12-04T08:57:06.5502275Z  * [new tag]                 v1.13.0-rc4                 -> v1.13.0-rc4
2025-12-04T08:57:06.5503395Z  * [new tag]                 v1.13.0-rc5                 -> v1.13.0-rc5
2025-12-04T08:57:06.5504398Z  * [new tag]                 v1.13.0-rc6                 -> v1.13.0-rc6
2025-12-04T08:57:06.5505673Z  * [new tag]                 v1.13.1                     -> v1.13.1
2025-12-04T08:57:06.5506801Z  * [new tag]                 v1.13.1-rc1                 -> v1.13.1-rc1
2025-12-04T08:57:06.5507989Z  * [new tag]                 v1.2.0                      -> v1.2.0
2025-12-04T08:57:06.5509241Z  * [new tag]                 v1.2.0a0                    -> v1.2.0a0
2025-12-04T08:57:06.5510453Z  * [new tag]                 v1.3.0                      -> v1.3.0
2025-12-04T08:57:06.5511734Z  * [new tag]                 v1.3.0a0                    -> v1.3.0a0
2025-12-04T08:57:06.5512844Z  * [new tag]                 v1.3.1                      -> v1.3.1
2025-12-04T08:57:06.5514017Z  * [new tag]                 v1.4.0                      -> v1.4.0
2025-12-04T08:57:06.5515225Z  * [new tag]                 v1.4.0a0                    -> v1.4.0a0
2025-12-04T08:57:06.5516329Z  * [new tag]                 v1.4.1                      -> v1.4.1
2025-12-04T08:57:06.5517710Z  * [new tag]                 v1.5.0                      -> v1.5.0
2025-12-04T08:57:06.5519024Z  * [new tag]                 v1.5.0-rc1                  -> v1.5.0-rc1
2025-12-04T08:57:06.5520453Z  * [new tag]                 v1.5.0-rc2                  -> v1.5.0-rc2
2025-12-04T08:57:06.5521746Z  * [new tag]                 v1.5.0-rc3                  -> v1.5.0-rc3
2025-12-04T08:57:06.5522921Z  * [new tag]                 v1.5.0-rc4                  -> v1.5.0-rc4
2025-12-04T08:57:06.5524041Z  * [new tag]                 v1.5.0-rc5                  -> v1.5.0-rc5
2025-12-04T08:57:06.5525265Z  * [new tag]                 v1.5.1                      -> v1.5.1
2025-12-04T08:57:06.5526394Z  * [new tag]                 v1.5.1-rc1                  -> v1.5.1-rc1
2025-12-04T08:57:06.5527396Z  * [new tag]                 v1.6.0                      -> v1.6.0
2025-12-04T08:57:06.5528662Z  * [new tag]                 v1.6.0-rc1                  -> v1.6.0-rc1
2025-12-04T08:57:06.5529932Z  * [new tag]                 v1.6.0-rc2                  -> v1.6.0-rc2
2025-12-04T08:57:06.5531198Z  * [new tag]                 v1.6.0-rc3                  -> v1.6.0-rc3
2025-12-04T08:57:06.5532673Z  * [new tag]                 v1.6.0-rc4                  -> v1.6.0-rc4
2025-12-04T08:57:06.5533924Z  * [new tag]                 v1.6.0-rc5                  -> v1.6.0-rc5
2025-12-04T08:57:06.5535106Z  * [new tag]                 v1.6.0-rc6                  -> v1.6.0-rc6
2025-12-04T08:57:06.5536198Z  * [new tag]                 v1.6.0-rc7                  -> v1.6.0-rc7
2025-12-04T08:57:06.5537372Z  * [new tag]                 v1.7.0                      -> v1.7.0
2025-12-04T08:57:06.5538614Z  * [new tag]                 v1.7.0-rc1                  -> v1.7.0-rc1
2025-12-04T08:57:06.5540004Z  * [new tag]                 v1.7.0-rc2                  -> v1.7.0-rc2
2025-12-04T08:57:06.5541235Z  * [new tag]                 v1.7.0-rc3                  -> v1.7.0-rc3
2025-12-04T08:57:06.5542691Z  * [new tag]                 v1.7.0-rc4                  -> v1.7.0-rc4
2025-12-04T08:57:06.5543996Z  * [new tag]                 v1.7.1                      -> v1.7.1
2025-12-04T08:57:06.5545344Z  * [new tag]                 v1.7.1-rc1                  -> v1.7.1-rc1
2025-12-04T08:57:06.5546627Z  * [new tag]                 v1.7.1-rc2                  -> v1.7.1-rc2
2025-12-04T08:57:06.5547776Z  * [new tag]                 v1.7.1-rc3                  -> v1.7.1-rc3
2025-12-04T08:57:06.5548978Z  * [new tag]                 v1.8.0                      -> v1.8.0
2025-12-04T08:57:06.5550100Z  * [new tag]                 v1.8.0-rc1                  -> v1.8.0-rc1
2025-12-04T08:57:06.5551311Z  * [new tag]                 v1.8.0-rc2                  -> v1.8.0-rc2
2025-12-04T08:57:06.5552574Z  * [new tag]                 v1.8.0-rc3                  -> v1.8.0-rc3
2025-12-04T08:57:06.5553794Z  * [new tag]                 v1.8.0-rc4                  -> v1.8.0-rc4
2025-12-04T08:57:06.5554894Z  * [new tag]                 v1.8.0-rc5                  -> v1.8.0-rc5
2025-12-04T08:57:06.5555994Z  * [new tag]                 v1.8.1                      -> v1.8.1
2025-12-04T08:57:06.5557197Z  * [new tag]                 v1.8.1-rc1                  -> v1.8.1-rc1
2025-12-04T08:57:06.5558334Z  * [new tag]                 v1.8.1-rc2                  -> v1.8.1-rc2
2025-12-04T08:57:06.5559472Z  * [new tag]                 v1.8.1-rc3                  -> v1.8.1-rc3
2025-12-04T08:57:06.5561188Z  * [new tag]                 v1.8.2                      -> v1.8.2
2025-12-04T08:57:06.5562324Z  * [new tag]                 v1.8.2-rc1                  -> v1.8.2-rc1
2025-12-04T08:57:06.5563502Z  * [new tag]                 v1.9.0                      -> v1.9.0
2025-12-04T08:57:06.5564762Z  * [new tag]                 v1.9.0-rc1                  -> v1.9.0-rc1
2025-12-04T08:57:06.5566041Z  * [new tag]                 v1.9.0-rc2                  -> v1.9.0-rc2
2025-12-04T08:57:06.5567344Z  * [new tag]                 v1.9.0-rc3                  -> v1.9.0-rc3
2025-12-04T08:57:06.5568494Z  * [new tag]                 v1.9.0-rc4                  -> v1.9.0-rc4
2025-12-04T08:57:06.5569685Z  * [new tag]                 v1.9.1                      -> v1.9.1
2025-12-04T08:57:06.5571061Z  * [new tag]                 v1.9.1-rc1                  -> v1.9.1-rc1
2025-12-04T08:57:06.5572209Z  * [new tag]                 v1.9.1-rc2                  -> v1.9.1-rc2
2025-12-04T08:57:06.5573442Z  * [new tag]                 v2.0.0                      -> v2.0.0
2025-12-04T08:57:06.5574667Z  * [new tag]                 v2.0.0-rc1                  -> v2.0.0-rc1
2025-12-04T08:57:06.5575996Z  * [new tag]                 v2.0.0-rc2                  -> v2.0.0-rc2
2025-12-04T08:57:06.5577278Z  * [new tag]                 v2.0.0-rc3                  -> v2.0.0-rc3
2025-12-04T08:57:06.5578507Z  * [new tag]                 v2.0.0-rc4                  -> v2.0.0-rc4
2025-12-04T08:57:06.5579797Z  * [new tag]                 v2.0.0-rc5                  -> v2.0.0-rc5
2025-12-04T08:57:06.5580907Z  * [new tag]                 v2.0.0-rc6                  -> v2.0.0-rc6
2025-12-04T08:57:06.5582154Z  * [new tag]                 v2.0.1                      -> v2.0.1
2025-12-04T08:57:06.5583552Z  * [new tag]                 v2.0.1-rc1                  -> v2.0.1-rc1
2025-12-04T08:57:06.5584519Z  * [new tag]                 v2.0.1-rc2                  -> v2.0.1-rc2
2025-12-04T08:57:06.5585747Z  * [new tag]                 v2.0.1-rc3                  -> v2.0.1-rc3
2025-12-04T08:57:06.5586850Z  * [new tag]                 v2.0.1-rc4                  -> v2.0.1-rc4
2025-12-04T08:57:06.5588430Z  * [new tag]                 v2.1.0                      -> v2.1.0
2025-12-04T08:57:06.5589668Z  * [new tag]                 v2.1.0-rc1                  -> v2.1.0-rc1
2025-12-04T08:57:06.5590933Z  * [new tag]                 v2.1.0-rc2                  -> v2.1.0-rc2
2025-12-04T08:57:06.5592221Z  * [new tag]                 v2.1.0-rc3                  -> v2.1.0-rc3
2025-12-04T08:57:06.5593643Z  * [new tag]                 v2.1.0-rc4                  -> v2.1.0-rc4
2025-12-04T08:57:06.5594927Z  * [new tag]                 v2.1.0-rc5                  -> v2.1.0-rc5
2025-12-04T08:57:06.5596065Z  * [new tag]                 v2.1.0-rc6                  -> v2.1.0-rc6
2025-12-04T08:57:06.5597278Z  * [new tag]                 v2.1.1                      -> v2.1.1
2025-12-04T08:57:06.5598652Z  * [new tag]                 v2.1.1-rc1                  -> v2.1.1-rc1
2025-12-04T08:57:06.5600114Z  * [new tag]                 v2.1.1-rc2                  -> v2.1.1-rc2
2025-12-04T08:57:06.5601535Z  * [new tag]                 v2.1.1-rc3                  -> v2.1.1-rc3
2025-12-04T08:57:06.5602818Z  * [new tag]                 v2.1.1-rc4                  -> v2.1.1-rc4
2025-12-04T08:57:06.5604047Z  * [new tag]                 v2.1.1-rc5                  -> v2.1.1-rc5
2025-12-04T08:57:06.5605173Z  * [new tag]                 v2.1.1-rc6                  -> v2.1.1-rc6
2025-12-04T08:57:06.5606344Z  * [new tag]                 v2.1.2                      -> v2.1.2
2025-12-04T08:57:06.5607647Z  * [new tag]                 v2.1.2-rc1                  -> v2.1.2-rc1
2025-12-04T08:57:06.5608972Z  * [new tag]                 v2.1.2-rc2                  -> v2.1.2-rc2
2025-12-04T08:57:06.5610114Z  * [new tag]                 v2.1.2-rc3                  -> v2.1.2-rc3
2025-12-04T08:57:06.5611333Z  * [new tag]                 v2.2.0                      -> v2.2.0
2025-12-04T08:57:06.5612565Z  * [new tag]                 v2.2.0-rc1                  -> v2.2.0-rc1
2025-12-04T08:57:06.5613782Z  * [new tag]                 v2.2.0-rc2                  -> v2.2.0-rc2
2025-12-04T08:57:06.5615022Z  * [new tag]                 v2.2.0-rc3                  -> v2.2.0-rc3
2025-12-04T08:57:06.5616255Z  * [new tag]                 v2.2.0-rc4                  -> v2.2.0-rc4
2025-12-04T08:57:06.5617705Z  * [new tag]                 v2.2.0-rc5                  -> v2.2.0-rc5
2025-12-04T08:57:06.5618989Z  * [new tag]                 v2.2.0-rc6                  -> v2.2.0-rc6
2025-12-04T08:57:06.5620076Z  * [new tag]                 v2.2.0-rc7                  -> v2.2.0-rc7
2025-12-04T08:57:06.5621159Z  * [new tag]                 v2.2.0-rc8                  -> v2.2.0-rc8
2025-12-04T08:57:06.5622853Z  * [new tag]                 v2.2.1                      -> v2.2.1
2025-12-04T08:57:06.5624156Z  * [new tag]                 v2.2.1-rc1                  -> v2.2.1-rc1
2025-12-04T08:57:06.5625423Z  * [new tag]                 v2.2.1-rc2                  -> v2.2.1-rc2
2025-12-04T08:57:06.5626588Z  * [new tag]                 v2.2.1-rc3                  -> v2.2.1-rc3
2025-12-04T08:57:06.5627678Z  * [new tag]                 v2.2.2                      -> v2.2.2
2025-12-04T08:57:06.5628964Z  * [new tag]                 v2.2.2-rc1                  -> v2.2.2-rc1
2025-12-04T08:57:06.5630111Z  * [new tag]                 v2.2.2-rc2                  -> v2.2.2-rc2
2025-12-04T08:57:06.5631119Z  * [new tag]                 v2.2.2-rc3                  -> v2.2.2-rc3
2025-12-04T08:57:06.5632421Z  * [new tag]                 v2.3.0                      -> v2.3.0
2025-12-04T08:57:06.5633688Z  * [new tag]                 v2.3.0-rc1                  -> v2.3.0-rc1
2025-12-04T08:57:06.5635123Z  * [new tag]                 v2.3.0-rc10                 -> v2.3.0-rc10
2025-12-04T08:57:06.5636561Z  * [new tag]                 v2.3.0-rc11                 -> v2.3.0-rc11
2025-12-04T08:57:06.5637702Z  * [new tag]                 v2.3.0-rc12                 -> v2.3.0-rc12
2025-12-04T08:57:06.5638951Z  * [new tag]                 v2.3.0-rc2                  -> v2.3.0-rc2
2025-12-04T08:57:06.5640436Z  * [new tag]                 v2.3.0-rc3                  -> v2.3.0-rc3
2025-12-04T08:57:06.5641713Z  * [new tag]                 v2.3.0-rc4                  -> v2.3.0-rc4
2025-12-04T08:57:06.5642979Z  * [new tag]                 v2.3.0-rc5                  -> v2.3.0-rc5
2025-12-04T08:57:06.5644103Z  * [new tag]                 v2.3.0-rc6                  -> v2.3.0-rc6
2025-12-04T08:57:06.5645284Z  * [new tag]                 v2.3.0-rc7                  -> v2.3.0-rc7
2025-12-04T08:57:06.5646589Z  * [new tag]                 v2.3.0-rc8                  -> v2.3.0-rc8
2025-12-04T08:57:06.5647761Z  * [new tag]                 v2.3.0-rc9                  -> v2.3.0-rc9
2025-12-04T08:57:06.5648847Z  * [new tag]                 v2.3.1                      -> v2.3.1
2025-12-04T08:57:06.5650080Z  * [new tag]                 v2.3.1-rc1                  -> v2.3.1-rc1
2025-12-04T08:57:06.5651349Z  * [new tag]                 v2.3.1-rc2                  -> v2.3.1-rc2
2025-12-04T08:57:06.5652694Z  * [new tag]                 v2.3.1-rc3                  -> v2.3.1-rc3
2025-12-04T08:57:06.5654056Z  * [new tag]                 v2.4.0                      -> v2.4.0
2025-12-04T08:57:06.5655324Z  * [new tag]                 v2.4.0-rc1                  -> v2.4.0-rc1
2025-12-04T08:57:06.5656574Z  * [new tag]                 v2.4.0-rc2                  -> v2.4.0-rc2
2025-12-04T08:57:06.5657836Z  * [new tag]                 v2.4.0-rc3                  -> v2.4.0-rc3
2025-12-04T08:57:06.5659079Z  * [new tag]                 v2.4.0-rc4                  -> v2.4.0-rc4
2025-12-04T08:57:06.5660371Z  * [new tag]                 v2.4.0-rc5                  -> v2.4.0-rc5
2025-12-04T08:57:06.5661645Z  * [new tag]                 v2.4.0-rc6                  -> v2.4.0-rc6
2025-12-04T08:57:06.5662885Z  * [new tag]                 v2.4.0-rc7                  -> v2.4.0-rc7
2025-12-04T08:57:06.5664176Z  * [new tag]                 v2.4.0-rc8                  -> v2.4.0-rc8
2025-12-04T08:57:06.5665459Z  * [new tag]                 v2.4.0-rc9                  -> v2.4.0-rc9
2025-12-04T08:57:06.5666621Z  * [new tag]                 v2.4.1                      -> v2.4.1
2025-12-04T08:57:06.5667893Z  * [new tag]                 v2.4.1-rc1                  -> v2.4.1-rc1
2025-12-04T08:57:06.5669170Z  * [new tag]                 v2.4.1-rc2                  -> v2.4.1-rc2
2025-12-04T08:57:06.5670464Z  * [new tag]                 v2.4.1-rc3                  -> v2.4.1-rc3
2025-12-04T08:57:06.5671741Z  * [new tag]                 v2.5.0                      -> v2.5.0
2025-12-04T08:57:06.5672960Z  * [new tag]                 v2.5.0-rc1                  -> v2.5.0-rc1
2025-12-04T08:57:06.5674074Z  * [new tag]                 v2.5.0-rc10                 -> v2.5.0-rc10
2025-12-04T08:57:06.5675292Z  * [new tag]                 v2.5.0-rc2                  -> v2.5.0-rc2
2025-12-04T08:57:06.5676556Z  * [new tag]                 v2.5.0-rc3                  -> v2.5.0-rc3
2025-12-04T08:57:06.5677794Z  * [new tag]                 v2.5.0-rc4                  -> v2.5.0-rc4
2025-12-04T08:57:06.5679074Z  * [new tag]                 v2.5.0-rc5                  -> v2.5.0-rc5
2025-12-04T08:57:06.5680561Z  * [new tag]                 v2.5.0-rc6                  -> v2.5.0-rc6
2025-12-04T08:57:06.5681831Z  * [new tag]                 v2.5.0-rc7                  -> v2.5.0-rc7
2025-12-04T08:57:06.5683123Z  * [new tag]                 v2.5.0-rc8                  -> v2.5.0-rc8
2025-12-04T08:57:06.5684377Z  * [new tag]                 v2.5.0-rc9                  -> v2.5.0-rc9
2025-12-04T08:57:06.5685471Z  * [new tag]                 v2.5.1                      -> v2.5.1
2025-12-04T08:57:06.5686611Z  * [new tag]                 v2.5.1-rc1                  -> v2.5.1-rc1
2025-12-04T08:57:06.5687509Z  * [new tag]                 v2.6.0                      -> v2.6.0
2025-12-04T08:57:06.5688853Z  * [new tag]                 v2.6.0-rc1                  -> v2.6.0-rc1
2025-12-04T08:57:06.5690234Z  * [new tag]                 v2.6.0-rc2                  -> v2.6.0-rc2
2025-12-04T08:57:06.5691508Z  * [new tag]                 v2.6.0-rc3                  -> v2.6.0-rc3
2025-12-04T08:57:06.5692746Z  * [new tag]                 v2.6.0-rc4                  -> v2.6.0-rc4
2025-12-04T08:57:06.5694214Z  * [new tag]                 v2.6.0-rc5                  -> v2.6.0-rc5
2025-12-04T08:57:06.5695580Z  * [new tag]                 v2.6.0-rc6                  -> v2.6.0-rc6
2025-12-04T08:57:06.5696885Z  * [new tag]                 v2.6.0-rc7                  -> v2.6.0-rc7
2025-12-04T08:57:06.5698191Z  * [new tag]                 v2.6.0-rc8                  -> v2.6.0-rc8
2025-12-04T08:57:06.5699526Z  * [new tag]                 v2.6.0-rc9                  -> v2.6.0-rc9
2025-12-04T08:57:06.5700991Z  * [new tag]                 v2.7.0                      -> v2.7.0
2025-12-04T08:57:06.5702261Z  * [new tag]                 v2.7.0-rc1                  -> v2.7.0-rc1
2025-12-04T08:57:06.5703770Z  * [new tag]                 v2.7.0-rc10                 -> v2.7.0-rc10
2025-12-04T08:57:06.5705090Z  * [new tag]                 v2.7.0-rc2                  -> v2.7.0-rc2
2025-12-04T08:57:06.5706382Z  * [new tag]                 v2.7.0-rc3                  -> v2.7.0-rc3
2025-12-04T08:57:06.5707660Z  * [new tag]                 v2.7.0-rc4                  -> v2.7.0-rc4
2025-12-04T08:57:06.5708946Z  * [new tag]                 v2.7.0-rc5                  -> v2.7.0-rc5
2025-12-04T08:57:06.5710212Z  * [new tag]                 v2.7.0-rc6                  -> v2.7.0-rc6
2025-12-04T08:57:06.5711503Z  * [new tag]                 v2.7.0-rc7                  -> v2.7.0-rc7
2025-12-04T08:57:06.5712810Z  * [new tag]                 v2.7.0-rc8                  -> v2.7.0-rc8
2025-12-04T08:57:06.5714225Z  * [new tag]                 v2.7.0-rc9                  -> v2.7.0-rc9
2025-12-04T08:57:06.5715361Z  * [new tag]                 v2.7.1                      -> v2.7.1
2025-12-04T08:57:06.5716677Z  * [new tag]                 v2.7.1-rc1                  -> v2.7.1-rc1
2025-12-04T08:57:06.5720516Z  * [new tag]                 v2.7.1-rc2                  -> v2.7.1-rc2
2025-12-04T08:57:06.5721939Z  * [new tag]                 v2.7.1-rc3                  -> v2.7.1-rc3
2025-12-04T08:57:06.5723221Z  * [new tag]                 v2.7.1-rc4                  -> v2.7.1-rc4
2025-12-04T08:57:06.5724570Z  * [new tag]                 v2.7.1-rc5                  -> v2.7.1-rc5
2025-12-04T08:57:06.5725750Z  * [new tag]                 v2.8.0                      -> v2.8.0
2025-12-04T08:57:06.5726999Z  * [new tag]                 v2.8.0-rc1                  -> v2.8.0-rc1
2025-12-04T08:57:06.5728309Z  * [new tag]                 v2.8.0-rc2                  -> v2.8.0-rc2
2025-12-04T08:57:06.5729653Z  * [new tag]                 v2.8.0-rc3                  -> v2.8.0-rc3
2025-12-04T08:57:06.5731041Z  * [new tag]                 v2.8.0-rc4                  -> v2.8.0-rc4
2025-12-04T08:57:06.5732397Z  * [new tag]                 v2.8.0-rc5                  -> v2.8.0-rc5
2025-12-04T08:57:06.5733751Z  * [new tag]                 v2.8.0-rc6                  -> v2.8.0-rc6
2025-12-04T08:57:06.5735060Z  * [new tag]                 v2.8.0-rc7                  -> v2.8.0-rc7
2025-12-04T08:57:06.5736323Z  * [new tag]                 v2.8.0-rc8                  -> v2.8.0-rc8
2025-12-04T08:57:06.5737668Z  * [new tag]                 v2.9.0                      -> v2.9.0
2025-12-04T08:57:06.5738986Z  * [new tag]                 v2.9.0-rc1                  -> v2.9.0-rc1
2025-12-04T08:57:06.5740264Z  * [new tag]                 v2.9.0-rc10                 -> v2.9.0-rc10
2025-12-04T08:57:06.5741614Z  * [new tag]                 v2.9.0-rc11                 -> v2.9.0-rc11
2025-12-04T08:57:06.5743237Z  * [new tag]                 v2.9.0-rc2                  -> v2.9.0-rc2
2025-12-04T08:57:06.5744461Z  * [new tag]                 v2.9.0-rc3                  -> v2.9.0-rc3
2025-12-04T08:57:06.5745827Z  * [new tag]                 v2.9.0-rc4                  -> v2.9.0-rc4
2025-12-04T08:57:06.5747071Z  * [new tag]                 v2.9.0-rc5                  -> v2.9.0-rc5
2025-12-04T08:57:06.5748545Z  * [new tag]                 v2.9.0-rc6                  -> v2.9.0-rc6
2025-12-04T08:57:06.5749835Z  * [new tag]                 v2.9.0-rc7                  -> v2.9.0-rc7
2025-12-04T08:57:06.5751266Z  * [new tag]                 v2.9.0-rc8                  -> v2.9.0-rc8
2025-12-04T08:57:06.5752429Z  * [new tag]                 v2.9.0-rc9                  -> v2.9.0-rc9
2025-12-04T08:57:06.5753524Z  * [new tag]                 v2.9.1                      -> v2.9.1
2025-12-04T08:57:06.5754804Z  * [new tag]                 v2.9.1-rc1                  -> v2.9.1-rc1
2025-12-04T08:57:06.5756101Z  * [new tag]                 v2.9.1-rc2                  -> v2.9.1-rc2
2025-12-04T08:57:06.5757821Z  * [new tag]                 viable/strict/1759343184    -> viable/strict/1759343184
2025-12-04T08:57:06.5759121Z  * [new tag]                 viable/strict/1759346540    -> viable/strict/1759346540
2025-12-04T08:57:06.5760475Z  * [new tag]                 viable/strict/1759348181    -> viable/strict/1759348181
2025-12-04T08:57:06.5761682Z  * [new tag]                 viable/strict/1759350324    -> viable/strict/1759350324
2025-12-04T08:57:06.5762871Z  * [new tag]                 viable/strict/1759351793    -> viable/strict/1759351793
2025-12-04T08:57:06.5764124Z  * [new tag]                 viable/strict/1759353844    -> viable/strict/1759353844
2025-12-04T08:57:06.5765347Z  * [new tag]                 viable/strict/1759355374    -> viable/strict/1759355374
2025-12-04T08:57:06.5766557Z  * [new tag]                 viable/strict/1759357472    -> viable/strict/1759357472
2025-12-04T08:57:06.5767754Z  * [new tag]                 viable/strict/1759361002    -> viable/strict/1759361002
2025-12-04T08:57:06.5769213Z  * [new tag]                 viable/strict/1759362585    -> viable/strict/1759362585
2025-12-04T08:57:06.5770657Z  * [new tag]                 viable/strict/1759365359    -> viable/strict/1759365359
2025-12-04T08:57:06.5771950Z  * [new tag]                 viable/strict/1759370089    -> viable/strict/1759370089
2025-12-04T08:57:06.5773324Z  * [new tag]                 viable/strict/1759377554    -> viable/strict/1759377554
2025-12-04T08:57:06.5774615Z  * [new tag]                 viable/strict/1759379133    -> viable/strict/1759379133
2025-12-04T08:57:06.5775903Z  * [new tag]                 viable/strict/1759389871    -> viable/strict/1759389871
2025-12-04T08:57:06.5777188Z  * [new tag]                 viable/strict/1759393562    -> viable/strict/1759393562
2025-12-04T08:57:06.5778490Z  * [new tag]                 viable/strict/1759395076    -> viable/strict/1759395076
2025-12-04T08:57:06.5779828Z  * [new tag]                 viable/strict/1759398579    -> viable/strict/1759398579
2025-12-04T08:57:06.5781139Z  * [new tag]                 viable/strict/1759404142    -> viable/strict/1759404142
2025-12-04T08:57:06.5782409Z  * [new tag]                 viable/strict/1759405773    -> viable/strict/1759405773
2025-12-04T08:57:06.5783652Z  * [new tag]                 viable/strict/1759408041    -> viable/strict/1759408041
2025-12-04T08:57:06.5784946Z  * [new tag]                 viable/strict/1759411593    -> viable/strict/1759411593
2025-12-04T08:57:06.5786194Z  * [new tag]                 viable/strict/1759427395    -> viable/strict/1759427395
2025-12-04T08:57:06.5787441Z  * [new tag]                 viable/strict/1759434582    -> viable/strict/1759434582
2025-12-04T08:57:06.5788705Z  * [new tag]                 viable/strict/1759436720    -> viable/strict/1759436720
2025-12-04T08:57:06.5790014Z  * [new tag]                 viable/strict/1759440219    -> viable/strict/1759440219
2025-12-04T08:57:06.5791285Z  * [new tag]                 viable/strict/1759441948    -> viable/strict/1759441948
2025-12-04T08:57:06.5792643Z  * [new tag]                 viable/strict/1759443860    -> viable/strict/1759443860
2025-12-04T08:57:06.5793880Z  * [new tag]                 viable/strict/1759445377    -> viable/strict/1759445377
2025-12-04T08:57:06.5795137Z  * [new tag]                 viable/strict/1759447415    -> viable/strict/1759447415
2025-12-04T08:57:06.5796396Z  * [new tag]                 viable/strict/1759451750    -> viable/strict/1759451750
2025-12-04T08:57:06.5797671Z  * [new tag]                 viable/strict/1759453910    -> viable/strict/1759453910
2025-12-04T08:57:06.5798996Z  * [new tag]                 viable/strict/1759456483    -> viable/strict/1759456483
2025-12-04T08:57:06.5800414Z  * [new tag]                 viable/strict/1759459279    -> viable/strict/1759459279
2025-12-04T08:57:06.5801672Z  * [new tag]                 viable/strict/1759460742    -> viable/strict/1759460742
2025-12-04T08:57:06.5802942Z  * [new tag]                 viable/strict/1759462025    -> viable/strict/1759462025
2025-12-04T08:57:06.5804239Z  * [new tag]                 viable/strict/1759469086    -> viable/strict/1759469086
2025-12-04T08:57:06.5805603Z  * [new tag]                 viable/strict/1759470581    -> viable/strict/1759470581
2025-12-04T08:57:06.5806927Z  * [new tag]                 viable/strict/1759472786    -> viable/strict/1759472786
2025-12-04T08:57:06.5808199Z  * [new tag]                 viable/strict/1759476294    -> viable/strict/1759476294
2025-12-04T08:57:06.5809473Z  * [new tag]                 viable/strict/1759479963    -> viable/strict/1759479963
2025-12-04T08:57:06.5810755Z  * [new tag]                 viable/strict/1759492177    -> viable/strict/1759492177
2025-12-04T08:57:06.5812017Z  * [new tag]                 viable/strict/1759519278    -> viable/strict/1759519278
2025-12-04T08:57:06.5813339Z  * [new tag]                 viable/strict/1759524580    -> viable/strict/1759524580
2025-12-04T08:57:06.5814654Z  * [new tag]                 viable/strict/1759528193    -> viable/strict/1759528193
2025-12-04T08:57:06.5816079Z  * [new tag]                 viable/strict/1759533797    -> viable/strict/1759533797
2025-12-04T08:57:06.5817488Z  * [new tag]                 viable/strict/1759542780    -> viable/strict/1759542780
2025-12-04T08:57:06.5818883Z  * [new tag]                 viable/strict/1759549779    -> viable/strict/1759549779
2025-12-04T08:57:06.5820159Z  * [new tag]                 viable/strict/1759555455    -> viable/strict/1759555455
2025-12-04T08:57:06.5821446Z  * [new tag]                 viable/strict/1759559176    -> viable/strict/1759559176
2025-12-04T08:57:06.5822710Z  * [new tag]                 viable/strict/1759560629    -> viable/strict/1759560629
2025-12-04T08:57:06.5823999Z  * [new tag]                 viable/strict/1759569848    -> viable/strict/1759569848
2025-12-04T08:57:06.5825432Z  * [new tag]                 viable/strict/1759571382    -> viable/strict/1759571382
2025-12-04T08:57:06.5826745Z  * [new tag]                 viable/strict/1759573474    -> viable/strict/1759573474
2025-12-04T08:57:06.5827997Z  * [new tag]                 viable/strict/1759618187    -> viable/strict/1759618187
2025-12-04T08:57:06.5829301Z  * [new tag]                 viable/strict/1759626742    -> viable/strict/1759626742
2025-12-04T08:57:06.5830679Z  * [new tag]                 viable/strict/1759632427    -> viable/strict/1759632427
2025-12-04T08:57:06.5831944Z  * [new tag]                 viable/strict/1759634971    -> viable/strict/1759634971
2025-12-04T08:57:06.5833226Z  * [new tag]                 viable/strict/1759661382    -> viable/strict/1759661382
2025-12-04T08:57:06.5834499Z  * [new tag]                 viable/strict/1759663294    -> viable/strict/1759663294
2025-12-04T08:57:06.5835703Z  * [new tag]                 viable/strict/1759708178    -> viable/strict/1759708178
2025-12-04T08:57:06.5837535Z  * [new tag]                 viable/strict/1759715695    -> viable/strict/1759715695
2025-12-04T08:57:06.5838936Z  * [new tag]                 viable/strict/1759728293    -> viable/strict/1759728293
2025-12-04T08:57:06.5840236Z  * [new tag]                 viable/strict/1759735513    -> viable/strict/1759735513
2025-12-04T08:57:06.5841601Z  * [new tag]                 viable/strict/1759739177    -> viable/strict/1759739177
2025-12-04T08:57:06.5842843Z  * [new tag]                 viable/strict/1759758635    -> viable/strict/1759758635
2025-12-04T08:57:06.5844093Z  * [new tag]                 viable/strict/1759765784    -> viable/strict/1759765784
2025-12-04T08:57:06.5845351Z  * [new tag]                 viable/strict/1759767948    -> viable/strict/1759767948
2025-12-04T08:57:06.5846666Z  * [new tag]                 viable/strict/1759771461    -> viable/strict/1759771461
2025-12-04T08:57:06.5847905Z  * [new tag]                 viable/strict/1759776706    -> viable/strict/1759776706
2025-12-04T08:57:06.5849215Z  * [new tag]                 viable/strict/1759782317    -> viable/strict/1759782317
2025-12-04T08:57:06.5850555Z  * [new tag]                 viable/strict/1759783777    -> viable/strict/1759783777
2025-12-04T08:57:06.5851904Z  * [new tag]                 viable/strict/1759785815    -> viable/strict/1759785815
2025-12-04T08:57:06.5853170Z  * [new tag]                 viable/strict/1759789459    -> viable/strict/1759789459
2025-12-04T08:57:06.5854432Z  * [new tag]                 viable/strict/1759790974    -> viable/strict/1759790974
2025-12-04T08:57:06.5855679Z  * [new tag]                 viable/strict/1759794583    -> viable/strict/1759794583
2025-12-04T08:57:06.5856970Z  * [new tag]                 viable/strict/1759797408    -> viable/strict/1759797408
2025-12-04T08:57:06.5858266Z  * [new tag]                 viable/strict/1759799518    -> viable/strict/1759799518
2025-12-04T08:57:06.5859533Z  * [new tag]                 viable/strict/1759804909    -> viable/strict/1759804909
2025-12-04T08:57:06.5860808Z  * [new tag]                 viable/strict/1759807643    -> viable/strict/1759807643
2025-12-04T08:57:06.5862136Z  * [new tag]                 viable/strict/1759809089    -> viable/strict/1759809089
2025-12-04T08:57:06.5863387Z  * [new tag]                 viable/strict/1759811145    -> viable/strict/1759811145
2025-12-04T08:57:06.5864690Z  * [new tag]                 viable/strict/1759812581    -> viable/strict/1759812581
2025-12-04T08:57:06.5865974Z  * [new tag]                 viable/strict/1759814683    -> viable/strict/1759814683
2025-12-04T08:57:06.5867277Z  * [new tag]                 viable/strict/1759821889    -> viable/strict/1759821889
2025-12-04T08:57:06.5868564Z  * [new tag]                 viable/strict/1759823376    -> viable/strict/1759823376
2025-12-04T08:57:06.5869845Z  * [new tag]                 viable/strict/1759827107    -> viable/strict/1759827107
2025-12-04T08:57:06.5871124Z  * [new tag]                 viable/strict/1759830577    -> viable/strict/1759830577
2025-12-04T08:57:06.5872461Z  * [new tag]                 viable/strict/1759832720    -> viable/strict/1759832720
2025-12-04T08:57:06.5873740Z  * [new tag]                 viable/strict/1759842063    -> viable/strict/1759842063
2025-12-04T08:57:06.5874978Z  * [new tag]                 viable/strict/1759847121    -> viable/strict/1759847121
2025-12-04T08:57:06.5876491Z  * [new tag]                 viable/strict/1759850721    -> viable/strict/1759850721
2025-12-04T08:57:06.5877763Z  * [new tag]                 viable/strict/1759857870    -> viable/strict/1759857870
2025-12-04T08:57:06.5879071Z  * [new tag]                 viable/strict/1759863143    -> viable/strict/1759863143
2025-12-04T08:57:06.5880434Z  * [new tag]                 viable/strict/1759875874    -> viable/strict/1759875874
2025-12-04T08:57:06.5881644Z  * [new tag]                 viable/strict/1759877385    -> viable/strict/1759877385
2025-12-04T08:57:06.5882914Z  * [new tag]                 viable/strict/1759883801    -> viable/strict/1759883801
2025-12-04T08:57:06.5884195Z  * [new tag]                 viable/strict/1759885922    -> viable/strict/1759885922
2025-12-04T08:57:06.5885590Z  * [new tag]                 viable/strict/1759888488    -> viable/strict/1759888488
2025-12-04T08:57:06.5886804Z  * [new tag]                 viable/strict/1759895471    -> viable/strict/1759895471
2025-12-04T08:57:06.5888077Z  * [new tag]                 viable/strict/1759904803    -> viable/strict/1759904803
2025-12-04T08:57:06.5889537Z  * [new tag]                 viable/strict/1759908300    -> viable/strict/1759908300
2025-12-04T08:57:06.5890887Z  * [new tag]                 viable/strict/1759915520    -> viable/strict/1759915520
2025-12-04T08:57:06.5892164Z  * [new tag]                 viable/strict/1759916978    -> viable/strict/1759916978
2025-12-04T08:57:06.5893381Z  * [new tag]                 viable/strict/1759930024    -> viable/strict/1759930024
2025-12-04T08:57:06.5894634Z  * [new tag]                 viable/strict/1759948122    -> viable/strict/1759948122
2025-12-04T08:57:06.5895914Z  * [new tag]                 viable/strict/1759952983    -> viable/strict/1759952983
2025-12-04T08:57:06.5897302Z  * [new tag]                 viable/strict/1759955121    -> viable/strict/1759955121
2025-12-04T08:57:06.5898890Z  * [new tag]                 viable/strict/1759962298    -> viable/strict/1759962298
2025-12-04T08:57:06.5900216Z  * [new tag]                 viable/strict/1759965837    -> viable/strict/1759965837
2025-12-04T08:57:06.5901499Z  * [new tag]                 viable/strict/1759970213    -> viable/strict/1759970213
2025-12-04T08:57:06.5902880Z  * [new tag]                 viable/strict/1759974894    -> viable/strict/1759974894
2025-12-04T08:57:06.5904164Z  * [new tag]                 viable/strict/1759977763    -> viable/strict/1759977763
2025-12-04T08:57:06.5905515Z  * [new tag]                 viable/strict/1759979241    -> viable/strict/1759979241
2025-12-04T08:57:06.5906797Z  * [new tag]                 viable/strict/1759985417    -> viable/strict/1759985417
2025-12-04T08:57:06.5908076Z  * [new tag]                 viable/strict/1759987490    -> viable/strict/1759987490
2025-12-04T08:57:06.5909363Z  * [new tag]                 viable/strict/1759996180    -> viable/strict/1759996180
2025-12-04T08:57:06.5922925Z  * [new tag]                 viable/strict/1760065682    -> viable/strict/1760065682
2025-12-04T08:57:06.5923303Z  * [new tag]                 viable/strict/1760066894    -> viable/strict/1760066894
2025-12-04T08:57:06.5923483Z  * [new tag]                 viable/strict/1760070345    -> viable/strict/1760070345
2025-12-04T08:57:06.5923630Z  * [new tag]                 viable/strict/1760089782    -> viable/strict/1760089782
2025-12-04T08:57:06.5923765Z  * [new tag]                 viable/strict/1760091921    -> viable/strict/1760091921
2025-12-04T08:57:06.5923915Z  * [new tag]                 viable/strict/1760127924    -> viable/strict/1760127924
2025-12-04T08:57:06.5924052Z  * [new tag]                 viable/strict/1760129489    -> viable/strict/1760129489
2025-12-04T08:57:06.5924180Z  * [new tag]                 viable/strict/1760132980    -> viable/strict/1760132980
2025-12-04T08:57:06.5924421Z  * [new tag]                 viable/strict/1760135060    -> viable/strict/1760135060
2025-12-04T08:57:06.5924653Z  * [new tag]                 viable/strict/1760215782    -> viable/strict/1760215782
2025-12-04T08:57:06.5924812Z  * [new tag]                 viable/strict/1760273849    -> viable/strict/1760273849
2025-12-04T08:57:06.5925999Z  * [new tag]                 viable/strict/1760275517    -> viable/strict/1760275517
2025-12-04T08:57:06.5927214Z  * [new tag]                 viable/strict/1760276979    -> viable/strict/1760276979
2025-12-04T08:57:06.5928469Z  * [new tag]                 viable/strict/1760279007    -> viable/strict/1760279007
2025-12-04T08:57:06.5929821Z  * [new tag]                 viable/strict/1760286328    -> viable/strict/1760286328
2025-12-04T08:57:06.5931037Z  * [new tag]                 viable/strict/1760493304    -> viable/strict/1760493304
2025-12-04T08:57:06.5932322Z  * [new tag]                 viable/strict/1760496298    -> viable/strict/1760496298
2025-12-04T08:57:06.5933813Z  * [new tag]                 viable/strict/1760518396    -> viable/strict/1760518396
2025-12-04T08:57:06.5935083Z  * [new tag]                 viable/strict/1760534864    -> viable/strict/1760534864
2025-12-04T08:57:06.5936296Z  * [new tag]                 viable/strict/1760549062    -> viable/strict/1760549062
2025-12-04T08:57:06.5937597Z  * [new tag]                 viable/strict/1760552799    -> viable/strict/1760552799
2025-12-04T08:57:06.5938933Z  * [new tag]                 viable/strict/1760554355    -> viable/strict/1760554355
2025-12-04T08:57:06.5940289Z  * [new tag]                 viable/strict/1760556275    -> viable/strict/1760556275
2025-12-04T08:57:06.5941578Z  * [new tag]                 viable/strict/1760564979    -> viable/strict/1760564979
2025-12-04T08:57:06.5942906Z  * [new tag]                 viable/strict/1760567049    -> viable/strict/1760567049
2025-12-04T08:57:06.5944502Z  * [new tag]                 viable/strict/1760568585    -> viable/strict/1760568585
2025-12-04T08:57:06.5945838Z  * [new tag]                 viable/strict/1760570630    -> viable/strict/1760570630
2025-12-04T08:57:06.5947117Z  * [new tag]                 viable/strict/1760572180    -> viable/strict/1760572180
2025-12-04T08:57:06.5948371Z  * [new tag]                 viable/strict/1760575094    -> viable/strict/1760575094
2025-12-04T08:57:06.5949754Z  * [new tag]                 viable/strict/1760579709    -> viable/strict/1760579709
2025-12-04T08:57:06.5951407Z  * [new tag]                 viable/strict/1760582614    -> viable/strict/1760582614
2025-12-04T08:57:06.5952750Z  * [new tag]                 viable/strict/1760586815    -> viable/strict/1760586815
2025-12-04T08:57:06.5953952Z  * [new tag]                 viable/strict/1760588829    -> viable/strict/1760588829
2025-12-04T08:57:06.5955191Z  * [new tag]                 viable/strict/1760590200    -> viable/strict/1760590200
2025-12-04T08:57:06.5956516Z  * [new tag]                 viable/strict/1760592311    -> viable/strict/1760592311
2025-12-04T08:57:06.5957782Z  * [new tag]                 viable/strict/1760619733    -> viable/strict/1760619733
2025-12-04T08:57:06.5959001Z  * [new tag]                 viable/strict/1760628335    -> viable/strict/1760628335
2025-12-04T08:57:06.5960314Z  * [new tag]                 viable/strict/1760635490    -> viable/strict/1760635490
2025-12-04T08:57:06.5961617Z  * [new tag]                 viable/strict/1760640743    -> viable/strict/1760640743
2025-12-04T08:57:06.5962890Z  * [new tag]                 viable/strict/1760642528    -> viable/strict/1760642528
2025-12-04T08:57:06.5964174Z  * [new tag]                 viable/strict/1760646330    -> viable/strict/1760646330
2025-12-04T08:57:06.5965434Z  * [new tag]                 viable/strict/1760666101    -> viable/strict/1760666101
2025-12-04T08:57:06.5966737Z  * [new tag]                 viable/strict/1760668990    -> viable/strict/1760668990
2025-12-04T08:57:06.5968107Z  * [new tag]                 viable/strict/1760670600    -> viable/strict/1760670600
2025-12-04T08:57:06.5969371Z  * [new tag]                 viable/strict/1760671704    -> viable/strict/1760671704
2025-12-04T08:57:06.5970660Z  * [new tag]                 viable/strict/1760673121    -> viable/strict/1760673121
2025-12-04T08:57:06.5971947Z  * [new tag]                 viable/strict/1760675352    -> viable/strict/1760675352
2025-12-04T08:57:06.5973231Z  * [new tag]                 viable/strict/1760696731    -> viable/strict/1760696731
2025-12-04T08:57:06.5975681Z  * [new tag]                 viable/strict/1760723515    -> viable/strict/1760723515
2025-12-04T08:57:06.5976990Z  * [new tag]                 viable/strict/1760727234    -> viable/strict/1760727234
2025-12-04T08:57:06.5978338Z  * [new tag]                 viable/strict/1760730578    -> viable/strict/1760730578
2025-12-04T08:57:06.5979600Z  * [new tag]                 viable/strict/1760732726    -> viable/strict/1760732726
2025-12-04T08:57:06.5980876Z  * [new tag]                 viable/strict/1760734180    -> viable/strict/1760734180
2025-12-04T08:57:06.5982268Z  * [new tag]                 viable/strict/1760736251    -> viable/strict/1760736251
2025-12-04T08:57:06.5983567Z  * [new tag]                 viable/strict/1760737772    -> viable/strict/1760737772
2025-12-04T08:57:06.5984990Z  * [new tag]                 viable/strict/1760758005    -> viable/strict/1760758005
2025-12-04T08:57:06.5986243Z  * [new tag]                 viable/strict/1760761532    -> viable/strict/1760761532
2025-12-04T08:57:06.5987529Z  * [new tag]                 viable/strict/1760802581    -> viable/strict/1760802581
2025-12-04T08:57:06.5988811Z  * [new tag]                 viable/strict/1760827772    -> viable/strict/1760827772
2025-12-04T08:57:06.5990120Z  * [new tag]                 viable/strict/1760834524    -> viable/strict/1760834524
2025-12-04T08:57:06.5991467Z  * [new tag]                 viable/strict/1760845009    -> viable/strict/1760845009
2025-12-04T08:57:06.5992850Z  * [new tag]                 viable/strict/1760876836    -> viable/strict/1760876836
2025-12-04T08:57:06.5994138Z  * [new tag]                 viable/strict/1760880329    -> viable/strict/1760880329
2025-12-04T08:57:06.5995411Z  * [new tag]                 viable/strict/1760888987    -> viable/strict/1760888987
2025-12-04T08:57:06.5996667Z  * [new tag]                 viable/strict/1760912664    -> viable/strict/1760912664
2025-12-04T08:57:06.5997931Z  * [new tag]                 viable/strict/1760925321    -> viable/strict/1760925321
2025-12-04T08:57:06.5999202Z  * [new tag]                 viable/strict/1760931488    -> viable/strict/1760931488
2025-12-04T08:57:06.6000611Z  * [new tag]                 viable/strict/1760932693    -> viable/strict/1760932693
2025-12-04T08:57:06.6001931Z  * [new tag]                 viable/strict/1761004184    -> viable/strict/1761004184
2025-12-04T08:57:06.6003242Z  * [new tag]                 viable/strict/1761014748    -> viable/strict/1761014748
2025-12-04T08:57:06.6004531Z  * [new tag]                 viable/strict/1761017491    -> viable/strict/1761017491
2025-12-04T08:57:06.6005815Z  * [new tag]                 viable/strict/1761018806    -> viable/strict/1761018806
2025-12-04T08:57:06.6007601Z  * [new tag]                 viable/strict/1761020754    -> viable/strict/1761020754
2025-12-04T08:57:06.6008893Z  * [new tag]                 viable/strict/1761024303    -> viable/strict/1761024303
2025-12-04T08:57:06.6010186Z  * [new tag]                 viable/strict/1761029582    -> viable/strict/1761029582
2025-12-04T08:57:06.6011454Z  * [new tag]                 viable/strict/1761031535    -> viable/strict/1761031535
2025-12-04T08:57:06.6012723Z  * [new tag]                 viable/strict/1761035196    -> viable/strict/1761035196
2025-12-04T08:57:06.6014094Z  * [new tag]                 viable/strict/1761045825    -> viable/strict/1761045825
2025-12-04T08:57:06.6015410Z  * [new tag]                 viable/strict/1761054796    -> viable/strict/1761054796
2025-12-04T08:57:06.6016699Z  * [new tag]                 viable/strict/1761060314    -> viable/strict/1761060314
2025-12-04T08:57:06.6020300Z  * [new tag]                 viable/strict/1761071198    -> viable/strict/1761071198
2025-12-04T08:57:06.6021650Z  * [new tag]                 viable/strict/1761074628    -> viable/strict/1761074628
2025-12-04T08:57:06.6022980Z  * [new tag]                 viable/strict/1761078351    -> viable/strict/1761078351
2025-12-04T08:57:06.6024269Z  * [new tag]                 viable/strict/1761079822    -> viable/strict/1761079822
2025-12-04T08:57:06.6025535Z  * [new tag]                 viable/strict/1761081873    -> viable/strict/1761081873
2025-12-04T08:57:06.6026859Z  * [new tag]                 viable/strict/1761083392    -> viable/strict/1761083392
2025-12-04T08:57:06.6028198Z  * [new tag]                 viable/strict/1761085465    -> viable/strict/1761085465
2025-12-04T08:57:06.6029490Z  * [new tag]                 viable/strict/1761089099    -> viable/strict/1761089099
2025-12-04T08:57:06.6030776Z  * [new tag]                 viable/strict/1761095535    -> viable/strict/1761095535
2025-12-04T08:57:06.6032270Z  * [new tag]                 viable/strict/1761098119    -> viable/strict/1761098119
2025-12-04T08:57:06.6033808Z  * [new tag]                 viable/strict/1761101330    -> viable/strict/1761101330
2025-12-04T08:57:06.6035060Z  * [new tag]                 viable/strict/1761114425    -> viable/strict/1761114425
2025-12-04T08:57:06.6036332Z  * [new tag]                 viable/strict/1761116036    -> viable/strict/1761116036
2025-12-04T08:57:06.6037659Z  * [new tag]                 viable/strict/1761119379    -> viable/strict/1761119379
2025-12-04T08:57:06.6038934Z  * [new tag]                 viable/strict/1761121601    -> viable/strict/1761121601
2025-12-04T08:57:06.6040317Z  * [new tag]                 viable/strict/1761123234    -> viable/strict/1761123234
2025-12-04T08:57:06.6041633Z  * [new tag]                 viable/strict/1761126621    -> viable/strict/1761126621
2025-12-04T08:57:06.6043041Z  * [new tag]                 viable/strict/1761132259    -> viable/strict/1761132259
2025-12-04T08:57:06.6044290Z  * [new tag]                 viable/strict/1761146746    -> viable/strict/1761146746
2025-12-04T08:57:06.6045559Z  * [new tag]                 viable/strict/1761164752    -> viable/strict/1761164752
2025-12-04T08:57:06.6046839Z  * [new tag]                 viable/strict/1761166198    -> viable/strict/1761166198
2025-12-04T08:57:06.6048210Z  * [new tag]                 viable/strict/1761175424    -> viable/strict/1761175424
2025-12-04T08:57:06.6049494Z  * [new tag]                 viable/strict/1761176983    -> viable/strict/1761176983
2025-12-04T08:57:06.6050864Z  * [new tag]                 viable/strict/1761179891    -> viable/strict/1761179891
2025-12-04T08:57:06.6052200Z  * [new tag]                 viable/strict/1761181930    -> viable/strict/1761181930
2025-12-04T08:57:06.6053548Z  * [new tag]                 viable/strict/1761184516    -> viable/strict/1761184516
2025-12-04T08:57:06.6054871Z  * [new tag]                 viable/strict/1761190179    -> viable/strict/1761190179
2025-12-04T08:57:06.6056143Z  * [new tag]                 viable/strict/1761193558    -> viable/strict/1761193558
2025-12-04T08:57:06.6057398Z  * [new tag]                 viable/strict/1761207990    -> viable/strict/1761207990
2025-12-04T08:57:06.6058680Z  * [new tag]                 viable/strict/1761229539    -> viable/strict/1761229539
2025-12-04T08:57:06.6060166Z  * [new tag]                 viable/strict/1761244031    -> viable/strict/1761244031
2025-12-04T08:57:06.6061434Z  * [new tag]                 viable/strict/1761248986    -> viable/strict/1761248986
2025-12-04T08:57:06.6062839Z  * [new tag]                 viable/strict/1761259791    -> viable/strict/1761259791
2025-12-04T08:57:06.6064111Z  * [new tag]                 viable/strict/1761266139    -> viable/strict/1761266139
2025-12-04T08:57:06.6065426Z  * [new tag]                 viable/strict/1761268316    -> viable/strict/1761268316
2025-12-04T08:57:06.6066726Z  * [new tag]                 viable/strict/1761273805    -> viable/strict/1761273805
2025-12-04T08:57:06.6068034Z  * [new tag]                 viable/strict/1761275261    -> viable/strict/1761275261
2025-12-04T08:57:06.6069377Z  * [new tag]                 viable/strict/1761277913    -> viable/strict/1761277913
2025-12-04T08:57:06.6070714Z  * [new tag]                 viable/strict/1761290701    -> viable/strict/1761290701
2025-12-04T08:57:06.6072028Z  * [new tag]                 viable/strict/1761294396    -> viable/strict/1761294396
2025-12-04T08:57:06.6073301Z  * [new tag]                 viable/strict/1761303047    -> viable/strict/1761303047
2025-12-04T08:57:06.6074612Z  * [new tag]                 viable/strict/1761335388    -> viable/strict/1761335388
2025-12-04T08:57:06.6075912Z  * [new tag]                 viable/strict/1761337551    -> viable/strict/1761337551
2025-12-04T08:57:06.6077180Z  * [new tag]                 viable/strict/1761339007    -> viable/strict/1761339007
2025-12-04T08:57:06.6078441Z  * [new tag]                 viable/strict/1761341050    -> viable/strict/1761341050
2025-12-04T08:57:06.6079854Z  * [new tag]                 viable/strict/1761346188    -> viable/strict/1761346188
2025-12-04T08:57:06.6081416Z  * [new tag]                 viable/strict/1761349792    -> viable/strict/1761349792
2025-12-04T08:57:06.6082674Z  * [new tag]                 viable/strict/1761352620    -> viable/strict/1761352620
2025-12-04T08:57:06.6083957Z  * [new tag]                 viable/strict/1761354730    -> viable/strict/1761354730
2025-12-04T08:57:06.6085261Z  * [new tag]                 viable/strict/1761357298    -> viable/strict/1761357298
2025-12-04T08:57:06.6086531Z  * [new tag]                 viable/strict/1761360201    -> viable/strict/1761360201
2025-12-04T08:57:06.6087804Z  * [new tag]                 viable/strict/1761361753    -> viable/strict/1761361753
2025-12-04T08:57:06.6089139Z  * [new tag]                 viable/strict/1761364351    -> viable/strict/1761364351
2025-12-04T08:57:06.6090447Z  * [new tag]                 viable/strict/1761366338    -> viable/strict/1761366338
2025-12-04T08:57:06.6091814Z  * [new tag]                 viable/strict/1761367802    -> viable/strict/1761367802
2025-12-04T08:57:06.6093149Z  * [new tag]                 viable/strict/1761369889    -> viable/strict/1761369889
2025-12-04T08:57:06.6094978Z  * [new tag]                 viable/strict/1761371385    -> viable/strict/1761371385
2025-12-04T08:57:06.6096310Z  * [new tag]                 viable/strict/1761373581    -> viable/strict/1761373581
2025-12-04T08:57:06.6097690Z  * [new tag]                 viable/strict/1761375054    -> viable/strict/1761375054
2025-12-04T08:57:06.6099052Z  * [new tag]                 viable/strict/1761421785    -> viable/strict/1761421785
2025-12-04T08:57:06.6100475Z  * [new tag]                 viable/strict/1761434614    -> viable/strict/1761434614
2025-12-04T08:57:06.6101930Z  * [new tag]                 viable/strict/1761439254    -> viable/strict/1761439254
2025-12-04T08:57:06.6103352Z  * [new tag]                 viable/strict/1761454187    -> viable/strict/1761454187
2025-12-04T08:57:06.6104708Z  * [new tag]                 viable/strict/1761459991    -> viable/strict/1761459991
2025-12-04T08:57:06.6106249Z  * [new tag]                 viable/strict/1761470668    -> viable/strict/1761470668
2025-12-04T08:57:06.6107840Z  * [new tag]                 viable/strict/1761472188    -> viable/strict/1761472188
2025-12-04T08:57:06.6109176Z  * [new tag]                 viable/strict/1761503178    -> viable/strict/1761503178
2025-12-04T08:57:06.6110530Z  * [new tag]                 viable/strict/1761517492    -> viable/strict/1761517492
2025-12-04T08:57:06.6111876Z  * [new tag]                 viable/strict/1761518981    -> viable/strict/1761518981
2025-12-04T08:57:06.6113234Z  * [new tag]                 viable/strict/1761533609    -> viable/strict/1761533609
2025-12-04T08:57:06.6114456Z  * [new tag]                 viable/strict/1761546438    -> viable/strict/1761546438
2025-12-04T08:57:06.6115783Z  * [new tag]                 viable/strict/1761548133    -> viable/strict/1761548133
2025-12-04T08:57:06.6117559Z  * [new tag]                 viable/strict/1761555186    -> viable/strict/1761555186
2025-12-04T08:57:06.6118863Z  * [new tag]                 viable/strict/1761557178    -> viable/strict/1761557178
2025-12-04T08:57:06.6120300Z  * [new tag]                 viable/strict/1761560772    -> viable/strict/1761560772
2025-12-04T08:57:06.6121636Z  * [new tag]                 viable/strict/1761562266    -> viable/strict/1761562266
2025-12-04T08:57:06.6123043Z  * [new tag]                 viable/strict/1761564260    -> viable/strict/1761564260
2025-12-04T08:57:06.6124293Z  * [new tag]                 viable/strict/1761568072    -> viable/strict/1761568072
2025-12-04T08:57:06.6125562Z  * [new tag]                 viable/strict/1761571683    -> viable/strict/1761571683
2025-12-04T08:57:06.6126783Z  * [new tag]                 viable/strict/1761580199    -> viable/strict/1761580199
2025-12-04T08:57:06.6128070Z  * [new tag]                 viable/strict/1761587383    -> viable/strict/1761587383
2025-12-04T08:57:06.6129637Z  * [new tag]                 viable/strict/1761591165    -> viable/strict/1761591165
2025-12-04T08:57:06.6130892Z  * [new tag]                 viable/strict/1761594575    -> viable/strict/1761594575
2025-12-04T08:57:06.6132142Z  * [new tag]                 viable/strict/1761596710    -> viable/strict/1761596710
2025-12-04T08:57:06.6133455Z  * [new tag]                 viable/strict/1761598189    -> viable/strict/1761598189
2025-12-04T08:57:06.6134783Z  * [new tag]                 viable/strict/1761600254    -> viable/strict/1761600254
2025-12-04T08:57:06.6136102Z  * [new tag]                 viable/strict/1761603879    -> viable/strict/1761603879
2025-12-04T08:57:06.6137473Z  * [new tag]                 viable/strict/1761605429    -> viable/strict/1761605429
2025-12-04T08:57:06.6138865Z  * [new tag]                 viable/strict/1761607468    -> viable/strict/1761607468
2025-12-04T08:57:06.6140166Z  * [new tag]                 viable/strict/1761608983    -> viable/strict/1761608983
2025-12-04T08:57:06.6141542Z  * [new tag]                 viable/strict/1761611846    -> viable/strict/1761611846
2025-12-04T08:57:06.6142903Z  * [new tag]                 viable/strict/1761613922    -> viable/strict/1761613922
2025-12-04T08:57:06.6144122Z  * [new tag]                 viable/strict/1761616504    -> viable/strict/1761616504
2025-12-04T08:57:06.6145332Z  * [new tag]                 viable/strict/1761619599    -> viable/strict/1761619599
2025-12-04T08:57:06.6146597Z  * [new tag]                 viable/strict/1761686693    -> viable/strict/1761686693
2025-12-04T08:57:06.6147887Z  * [new tag]                 viable/strict/1761688179    -> viable/strict/1761688179
2025-12-04T08:57:06.6149290Z  * [new tag]                 viable/strict/1761691973    -> viable/strict/1761691973
2025-12-04T08:57:06.6150756Z  * [new tag]                 viable/strict/1761693884    -> viable/strict/1761693884
2025-12-04T08:57:06.6152133Z  * [new tag]                 viable/strict/1761695389    -> viable/strict/1761695389
2025-12-04T08:57:06.6153513Z  * [new tag]                 viable/strict/1761698408    -> viable/strict/1761698408
2025-12-04T08:57:06.6154879Z  * [new tag]                 viable/strict/1761702931    -> viable/strict/1761702931
2025-12-04T08:57:06.6156197Z  * [new tag]                 viable/strict/1761706307    -> viable/strict/1761706307
2025-12-04T08:57:06.6157528Z  * [new tag]                 viable/strict/1761709065    -> viable/strict/1761709065
2025-12-04T08:57:06.6158974Z  * [new tag]                 viable/strict/1761710285    -> viable/strict/1761710285
2025-12-04T08:57:06.6160502Z  * [new tag]                 viable/strict/1761711983    -> viable/strict/1761711983
2025-12-04T08:57:06.6161915Z  * [new tag]                 viable/strict/1761713514    -> viable/strict/1761713514
2025-12-04T08:57:06.6163290Z  * [new tag]                 viable/strict/1761715523    -> viable/strict/1761715523
2025-12-04T08:57:06.6164715Z  * [new tag]                 viable/strict/1761727973    -> viable/strict/1761727973
2025-12-04T08:57:06.6166073Z  * [new tag]                 viable/strict/1761751558    -> viable/strict/1761751558
2025-12-04T08:57:06.6167415Z  * [new tag]                 viable/strict/1761755187    -> viable/strict/1761755187
2025-12-04T08:57:06.6168785Z  * [new tag]                 viable/strict/1761756826    -> viable/strict/1761756826
2025-12-04T08:57:06.6170177Z  * [new tag]                 viable/strict/1761769551    -> viable/strict/1761769551
2025-12-04T08:57:06.6171559Z  * [new tag]                 viable/strict/1761771032    -> viable/strict/1761771032
2025-12-04T08:57:06.6172796Z  * [new tag]                 viable/strict/1761773101    -> viable/strict/1761773101
2025-12-04T08:57:06.6174179Z  * [new tag]                 viable/strict/1761781792    -> viable/strict/1761781792
2025-12-04T08:57:06.6175587Z  * [new tag]                 viable/strict/1761784788    -> viable/strict/1761784788
2025-12-04T08:57:06.6176888Z  * [new tag]                 viable/strict/1761786740    -> viable/strict/1761786740
2025-12-04T08:57:06.6178380Z  * [new tag]                 viable/strict/1761789332    -> viable/strict/1761789332
2025-12-04T08:57:06.6180004Z  * [new tag]                 viable/strict/1761792569    -> viable/strict/1761792569
2025-12-04T08:57:06.6181358Z  * [new tag]                 viable/strict/1761795289    -> viable/strict/1761795289
2025-12-04T08:57:06.6183127Z  * [new tag]                 viable/strict/1761798345    -> viable/strict/1761798345
2025-12-04T08:57:06.6184516Z  * [new tag]                 viable/strict/1761799827    -> viable/strict/1761799827
2025-12-04T08:57:06.6185900Z  * [new tag]                 viable/strict/1761805604    -> viable/strict/1761805604
2025-12-04T08:57:06.6187274Z  * [new tag]                 viable/strict/1761807202    -> viable/strict/1761807202
2025-12-04T08:57:06.6188598Z  * [new tag]                 viable/strict/1761809094    -> viable/strict/1761809094
2025-12-04T08:57:06.6189942Z  * [new tag]                 viable/strict/1761810576    -> viable/strict/1761810576
2025-12-04T08:57:06.6191340Z  * [new tag]                 viable/strict/1761812771    -> viable/strict/1761812771
2025-12-04T08:57:06.6192738Z  * [new tag]                 viable/strict/1761814363    -> viable/strict/1761814363
2025-12-04T08:57:06.6194180Z  * [new tag]                 viable/strict/1761857410    -> viable/strict/1761857410
2025-12-04T08:57:06.6195538Z  * [new tag]                 viable/strict/1761860985    -> viable/strict/1761860985
2025-12-04T08:57:06.6196903Z  * [new tag]                 viable/strict/1761863094    -> viable/strict/1761863094
2025-12-04T08:57:06.6198247Z  * [new tag]                 viable/strict/1761864590    -> viable/strict/1761864590
2025-12-04T08:57:06.6199588Z  * [new tag]                 viable/strict/1761866675    -> viable/strict/1761866675
2025-12-04T08:57:06.6201240Z  * [new tag]                 viable/strict/1761868178    -> viable/strict/1761868178
2025-12-04T08:57:06.6202578Z  * [new tag]                 viable/strict/1761871111    -> viable/strict/1761871111
2025-12-04T08:57:06.6203980Z  * [new tag]                 viable/strict/1761873126    -> viable/strict/1761873126
2025-12-04T08:57:06.6205389Z  * [new tag]                 viable/strict/1761875714    -> viable/strict/1761875714
2025-12-04T08:57:06.6206770Z  * [new tag]                 viable/strict/1761878924    -> viable/strict/1761878924
2025-12-04T08:57:06.6208160Z  * [new tag]                 viable/strict/1761881727    -> viable/strict/1761881727
2025-12-04T08:57:06.6209582Z  * [new tag]                 viable/strict/1761882959    -> viable/strict/1761882959
2025-12-04T08:57:06.6211732Z  * [new tag]                 viable/strict/1761886268    -> viable/strict/1761886268
2025-12-04T08:57:06.6213154Z  * [new tag]                 viable/strict/1761893641    -> viable/strict/1761893641
2025-12-04T08:57:06.6214145Z  * [new tag]                 viable/strict/1761931517    -> viable/strict/1761931517
2025-12-04T08:57:06.6215676Z  * [new tag]                 viable/strict/1761933080    -> viable/strict/1761933080
2025-12-04T08:57:06.6217161Z  * [new tag]                 viable/strict/1761935217    -> viable/strict/1761935217
2025-12-04T08:57:06.6218615Z  * [new tag]                 viable/strict/1761938533    -> viable/strict/1761938533
2025-12-04T08:57:06.6219983Z  * [new tag]                 viable/strict/1761940184    -> viable/strict/1761940184
2025-12-04T08:57:06.6221387Z  * [new tag]                 viable/strict/1761942338    -> viable/strict/1761942338
2025-12-04T08:57:06.6222664Z  * [new tag]                 viable/strict/1761946100    -> viable/strict/1761946100
2025-12-04T08:57:06.6224046Z  * [new tag]                 viable/strict/1761947374    -> viable/strict/1761947374
2025-12-04T08:57:06.6225387Z  * [new tag]                 viable/strict/1761950978    -> viable/strict/1761950978
2025-12-04T08:57:06.6226797Z  * [new tag]                 viable/strict/1761957727    -> viable/strict/1761957727
2025-12-04T08:57:06.6228128Z  * [new tag]                 viable/strict/1761959532    -> viable/strict/1761959532
2025-12-04T08:57:06.6229665Z  * [new tag]                 viable/strict/1761965366    -> viable/strict/1761965366
2025-12-04T08:57:06.6231074Z  * [new tag]                 viable/strict/1761968066    -> viable/strict/1761968066
2025-12-04T08:57:06.6232395Z  * [new tag]                 viable/strict/1761969322    -> viable/strict/1761969322
2025-12-04T08:57:06.6233717Z  * [new tag]                 viable/strict/1761974723    -> viable/strict/1761974723
2025-12-04T08:57:06.6235098Z  * [new tag]                 viable/strict/1761981837    -> viable/strict/1761981837
2025-12-04T08:57:06.6236520Z  * [new tag]                 viable/strict/1761985546    -> viable/strict/1761985546
2025-12-04T08:57:06.6237951Z  * [new tag]                 viable/strict/1761987030    -> viable/strict/1761987030
2025-12-04T08:57:06.6239364Z  * [new tag]                 viable/strict/1762003554    -> viable/strict/1762003554
2025-12-04T08:57:06.6240827Z  * [new tag]                 viable/strict/1762021560    -> viable/strict/1762021560
2025-12-04T08:57:06.6242187Z  * [new tag]                 viable/strict/1762032190    -> viable/strict/1762032190
2025-12-04T08:57:06.6243579Z  * [new tag]                 viable/strict/1762040981    -> viable/strict/1762040981
2025-12-04T08:57:06.6244921Z  * [new tag]                 viable/strict/1762048525    -> viable/strict/1762048525
2025-12-04T08:57:06.6246273Z  * [new tag]                 viable/strict/1762104223    -> viable/strict/1762104223
2025-12-04T08:57:06.6247657Z  * [new tag]                 viable/strict/1762105778    -> viable/strict/1762105778
2025-12-04T08:57:06.6248978Z  * [new tag]                 viable/strict/1762115109    -> viable/strict/1762115109
2025-12-04T08:57:06.6250327Z  * [new tag]                 viable/strict/1762125840    -> viable/strict/1762125840
2025-12-04T08:57:06.6251480Z  * [new tag]                 viable/strict/1762127377    -> viable/strict/1762127377
2025-12-04T08:57:06.6253163Z  * [new tag]                 viable/strict/1762134925    -> viable/strict/1762134925
2025-12-04T08:57:06.6254422Z  * [new tag]                 viable/strict/1762138338    -> viable/strict/1762138338
2025-12-04T08:57:06.6255780Z  * [new tag]                 viable/strict/1762148993    -> viable/strict/1762148993
2025-12-04T08:57:06.6257215Z  * [new tag]                 viable/strict/1762152871    -> viable/strict/1762152871
2025-12-04T08:57:06.6258600Z  * [new tag]                 viable/strict/1762156183    -> viable/strict/1762156183
2025-12-04T08:57:06.6259987Z  * [new tag]                 viable/strict/1762163457    -> viable/strict/1762163457
2025-12-04T08:57:06.6261304Z  * [new tag]                 viable/strict/1762165569    -> viable/strict/1762165569
2025-12-04T08:57:06.6262773Z  * [new tag]                 viable/strict/1762169035    -> viable/strict/1762169035
2025-12-04T08:57:06.6264171Z  * [new tag]                 viable/strict/1762174936    -> viable/strict/1762174936
2025-12-04T08:57:06.6265504Z  * [new tag]                 viable/strict/1762194412    -> viable/strict/1762194412
2025-12-04T08:57:06.6266872Z  * [new tag]                 viable/strict/1762195876    -> viable/strict/1762195876
2025-12-04T08:57:06.6268284Z  * [new tag]                 viable/strict/1762197788    -> viable/strict/1762197788
2025-12-04T08:57:06.6269719Z  * [new tag]                 viable/strict/1762199389    -> viable/strict/1762199389
2025-12-04T08:57:06.6271193Z  * [new tag]                 viable/strict/1762206585    -> viable/strict/1762206585
2025-12-04T08:57:06.6273054Z  * [new tag]                 viable/strict/1762210184    -> viable/strict/1762210184
2025-12-04T08:57:06.6274338Z  * [new tag]                 viable/strict/1762218736    -> viable/strict/1762218736
2025-12-04T08:57:06.6275728Z  * [new tag]                 viable/strict/1762224529    -> viable/strict/1762224529
2025-12-04T08:57:06.6277119Z  * [new tag]                 viable/strict/1762227253    -> viable/strict/1762227253
2025-12-04T08:57:06.6278373Z  * [new tag]                 viable/strict/1762228515    -> viable/strict/1762228515
2025-12-04T08:57:06.6280048Z  * [new tag]                 viable/strict/1762230349    -> viable/strict/1762230349
2025-12-04T08:57:06.6281332Z  * [new tag]                 viable/strict/1762231859    -> viable/strict/1762231859
2025-12-04T08:57:06.6282727Z  * [new tag]                 viable/strict/1762233925    -> viable/strict/1762233925
2025-12-04T08:57:06.6284185Z  * [new tag]                 viable/strict/1762237630    -> viable/strict/1762237630
2025-12-04T08:57:06.6285456Z  * [new tag]                 viable/strict/1762253522    -> viable/strict/1762253522
2025-12-04T08:57:06.6286971Z  * [new tag]                 viable/strict/1762278588    -> viable/strict/1762278588
2025-12-04T08:57:06.6288364Z  * [new tag]                 viable/strict/1762284203    -> viable/strict/1762284203
2025-12-04T08:57:06.6289720Z  * [new tag]                 viable/strict/1762289446    -> viable/strict/1762289446
2025-12-04T08:57:06.6291069Z  * [new tag]                 viable/strict/1762291515    -> viable/strict/1762291515
2025-12-04T08:57:06.6292455Z  * [new tag]                 viable/strict/1762295100    -> viable/strict/1762295100
2025-12-04T08:57:06.6293621Z  * [new tag]                 viable/strict/1762296590    -> viable/strict/1762296590
2025-12-04T08:57:06.6294889Z  * [new tag]                 viable/strict/1762300179    -> viable/strict/1762300179
2025-12-04T08:57:06.6296054Z  * [new tag]                 viable/strict/1762303207    -> viable/strict/1762303207
2025-12-04T08:57:06.6297509Z  * [new tag]                 viable/strict/1762386584    -> viable/strict/1762386584
2025-12-04T08:57:06.6298900Z  * [new tag]                 viable/strict/1762391537    -> viable/strict/1762391537
2025-12-04T08:57:06.6300147Z  * [new tag]                 viable/strict/1762394119    -> viable/strict/1762394119
2025-12-04T08:57:06.6301712Z  * [new tag]                 viable/strict/1762397437    -> viable/strict/1762397437
2025-12-04T08:57:06.6303164Z  * [new tag]                 viable/strict/1762400256    -> viable/strict/1762400256
2025-12-04T08:57:06.6304490Z  * [new tag]                 viable/strict/1762401469    -> viable/strict/1762401469
2025-12-04T08:57:06.6305849Z  * [new tag]                 viable/strict/1762408195    -> viable/strict/1762408195
2025-12-04T08:57:06.6307226Z  * [new tag]                 viable/strict/1762410411    -> viable/strict/1762410411
2025-12-04T08:57:06.6308741Z  * [new tag]                 viable/strict/1762417613    -> viable/strict/1762417613
2025-12-04T08:57:06.6310114Z  * [new tag]                 viable/strict/1762419198    -> viable/strict/1762419198
2025-12-04T08:57:06.6311472Z  * [new tag]                 viable/strict/1762422656    -> viable/strict/1762422656
2025-12-04T08:57:06.6313092Z  * [new tag]                 viable/strict/1762424746    -> viable/strict/1762424746
2025-12-04T08:57:06.6314551Z  * [new tag]                 viable/strict/1762446386    -> viable/strict/1762446386
2025-12-04T08:57:06.6315915Z  * [new tag]                 viable/strict/1762449912    -> viable/strict/1762449912
2025-12-04T08:57:06.6317284Z  * [new tag]                 viable/strict/1762457031    -> viable/strict/1762457031
2025-12-04T08:57:06.6318796Z  * [new tag]                 viable/strict/1762462441    -> viable/strict/1762462441
2025-12-04T08:57:06.6320293Z  * [new tag]                 viable/strict/1762467909    -> viable/strict/1762467909
2025-12-04T08:57:06.6321677Z  * [new tag]                 viable/strict/1762471493    -> viable/strict/1762471493
2025-12-04T08:57:06.6323054Z  * [new tag]                 viable/strict/1762475990    -> viable/strict/1762475990
2025-12-04T08:57:06.6324497Z  * [new tag]                 viable/strict/1762477933    -> viable/strict/1762477933
2025-12-04T08:57:06.6325949Z  * [new tag]                 viable/strict/1762491053    -> viable/strict/1762491053
2025-12-04T08:57:06.6327284Z  * [new tag]                 viable/strict/1762493118    -> viable/strict/1762493118
2025-12-04T08:57:06.6328692Z  * [new tag]                 viable/strict/1762498442    -> viable/strict/1762498442
2025-12-04T08:57:06.6330201Z  * [new tag]                 viable/strict/1762501778    -> viable/strict/1762501778
2025-12-04T08:57:06.6331534Z  * [new tag]                 viable/strict/1762504001    -> viable/strict/1762504001
2025-12-04T08:57:06.6332922Z  * [new tag]                 viable/strict/1762505583    -> viable/strict/1762505583
2025-12-04T08:57:06.6334324Z  * [new tag]                 viable/strict/1762507523    -> viable/strict/1762507523
2025-12-04T08:57:06.6335751Z  * [new tag]                 viable/strict/1762511140    -> viable/strict/1762511140
2025-12-04T08:57:06.6337194Z  * [new tag]                 viable/strict/1762512632    -> viable/strict/1762512632
2025-12-04T08:57:06.6338586Z  * [new tag]                 viable/strict/1762520467    -> viable/strict/1762520467
2025-12-04T08:57:06.6339972Z  * [new tag]                 viable/strict/1762522016    -> viable/strict/1762522016
2025-12-04T08:57:06.6341365Z  * [new tag]                 viable/strict/1762530591    -> viable/strict/1762530591
2025-12-04T08:57:06.6342743Z  * [new tag]                 viable/strict/1762543405    -> viable/strict/1762543405
2025-12-04T08:57:06.6343986Z  * [new tag]                 viable/strict/1762544998    -> viable/strict/1762544998
2025-12-04T08:57:06.6345324Z  * [new tag]                 viable/strict/1762552182    -> viable/strict/1762552182
2025-12-04T08:57:06.6346686Z  * [new tag]                 viable/strict/1762554297    -> viable/strict/1762554297
2025-12-04T08:57:06.6347845Z  * [new tag]                 viable/strict/1762559381    -> viable/strict/1762559381
2025-12-04T08:57:06.6349268Z  * [new tag]                 viable/strict/1762562222    -> viable/strict/1762562222
2025-12-04T08:57:06.6350680Z  * [new tag]                 viable/strict/1762564319    -> viable/strict/1762564319
2025-12-04T08:57:06.6351951Z  * [new tag]                 viable/strict/1762566904    -> viable/strict/1762566904
2025-12-04T08:57:06.6353272Z  * [new tag]                 viable/strict/1762569781    -> viable/strict/1762569781
2025-12-04T08:57:06.6354617Z  * [new tag]                 viable/strict/1762575940    -> viable/strict/1762575940
2025-12-04T08:57:06.6355975Z  * [new tag]                 viable/strict/1762580974    -> viable/strict/1762580974
2025-12-04T08:57:06.6357369Z  * [new tag]                 viable/strict/1762583185    -> viable/strict/1762583185
2025-12-04T08:57:06.6358708Z  * [new tag]                 viable/strict/1762586647    -> viable/strict/1762586647
2025-12-04T08:57:06.6360218Z  * [new tag]                 viable/strict/1762588183    -> viable/strict/1762588183
2025-12-04T08:57:06.6361989Z  * [new tag]                 viable/strict/1762593886    -> viable/strict/1762593886
2025-12-04T08:57:06.6363413Z  * [new tag]                 viable/strict/1762650743    -> viable/strict/1762650743
2025-12-04T08:57:06.6364801Z  * [new tag]                 viable/strict/1762653328    -> viable/strict/1762653328
2025-12-04T08:57:06.6366170Z  * [new tag]                 viable/strict/1762659342    -> viable/strict/1762659342
2025-12-04T08:57:06.6367547Z  * [new tag]                 viable/strict/1762662360    -> viable/strict/1762662360
2025-12-04T08:57:06.6368965Z  * [new tag]                 viable/strict/1762667377    -> viable/strict/1762667377
2025-12-04T08:57:06.6370310Z  * [new tag]                 viable/strict/1762671090    -> viable/strict/1762671090
2025-12-04T08:57:06.6371655Z  * [new tag]                 viable/strict/1762680284    -> viable/strict/1762680284
2025-12-04T08:57:06.6373084Z  * [new tag]                 viable/strict/1762683900    -> viable/strict/1762683900
2025-12-04T08:57:06.6374471Z  * [new tag]                 viable/strict/1762705541    -> viable/strict/1762705541
2025-12-04T08:57:06.6375816Z  * [new tag]                 viable/strict/1762709004    -> viable/strict/1762709004
2025-12-04T08:57:06.6377186Z  * [new tag]                 viable/strict/1762746004    -> viable/strict/1762746004
2025-12-04T08:57:06.6378633Z  * [new tag]                 viable/strict/1762748799    -> viable/strict/1762748799
2025-12-04T08:57:06.6380170Z  * [new tag]                 viable/strict/1762759504    -> viable/strict/1762759504
2025-12-04T08:57:06.6381590Z  * [new tag]                 viable/strict/1762760973    -> viable/strict/1762760973
2025-12-04T08:57:06.6382906Z  * [new tag]                 viable/strict/1762775374    -> viable/strict/1762775374
2025-12-04T08:57:06.6384268Z  * [new tag]                 viable/strict/1762777661    -> viable/strict/1762777661
2025-12-04T08:57:06.6385645Z  * [new tag]                 viable/strict/1762779774    -> viable/strict/1762779774
2025-12-04T08:57:06.6387182Z  * [new tag]                 viable/strict/1762781259    -> viable/strict/1762781259
2025-12-04T08:57:06.6388602Z  * [new tag]                 viable/strict/1762793628    -> viable/strict/1762793628
2025-12-04T08:57:06.6389999Z  * [new tag]                 viable/strict/1762800711    -> viable/strict/1762800711
2025-12-04T08:57:06.6391391Z  * [new tag]                 viable/strict/1762809894    -> viable/strict/1762809894
2025-12-04T08:57:06.6392704Z  * [new tag]                 viable/strict/1762811384    -> viable/strict/1762811384
2025-12-04T08:57:06.6394146Z  * [new tag]                 viable/strict/1762813841    -> viable/strict/1762813841
2025-12-04T08:57:06.6395527Z  * [new tag]                 viable/strict/1762815047    -> viable/strict/1762815047
2025-12-04T08:57:06.6397022Z  * [new tag]                 viable/strict/1762817094    -> viable/strict/1762817094
2025-12-04T08:57:06.6398430Z  * [new tag]                 viable/strict/1762818582    -> viable/strict/1762818582
2025-12-04T08:57:06.6399806Z  * [new tag]                 viable/strict/1762821623    -> viable/strict/1762821623
2025-12-04T08:57:06.6401188Z  * [new tag]                 viable/strict/1762823531    -> viable/strict/1762823531
2025-12-04T08:57:06.6402631Z  * [new tag]                 viable/strict/1762849583    -> viable/strict/1762849583
2025-12-04T08:57:06.6403956Z  * [new tag]                 viable/strict/1762851200    -> viable/strict/1762851200
2025-12-04T08:57:06.6405296Z  * [new tag]                 viable/strict/1762854603    -> viable/strict/1762854603
2025-12-04T08:57:06.6406689Z  * [new tag]                 viable/strict/1762858276    -> viable/strict/1762858276
2025-12-04T08:57:06.6408124Z  * [new tag]                 viable/strict/1762860891    -> viable/strict/1762860891
2025-12-04T08:57:06.6409846Z  * [new tag]                 viable/strict/1762866174    -> viable/strict/1762866174
2025-12-04T08:57:06.6411308Z  * [new tag]                 viable/strict/1762867653    -> viable/strict/1762867653
2025-12-04T08:57:06.6412687Z  * [new tag]                 viable/strict/1762872669    -> viable/strict/1762872669
2025-12-04T08:57:06.6413968Z  * [new tag]                 viable/strict/1762878380    -> viable/strict/1762878380
2025-12-04T08:57:06.6415308Z  * [new tag]                 viable/strict/1762889003    -> viable/strict/1762889003
2025-12-04T08:57:06.6416723Z  * [new tag]                 viable/strict/1762890589    -> viable/strict/1762890589
2025-12-04T08:57:06.6420349Z  * [new tag]                 viable/strict/1762892743    -> viable/strict/1762892743
2025-12-04T08:57:06.6421798Z  * [new tag]                 viable/strict/1762894271    -> viable/strict/1762894271
2025-12-04T08:57:06.6422993Z  * [new tag]                 viable/strict/1762896287    -> viable/strict/1762896287
2025-12-04T08:57:06.6424358Z  * [new tag]                 viable/strict/1762915871    -> viable/strict/1762915871
2025-12-04T08:57:06.6425735Z  * [new tag]                 viable/strict/1762918569    -> viable/strict/1762918569
2025-12-04T08:57:06.6427015Z  * [new tag]                 viable/strict/1762919776    -> viable/strict/1762919776
2025-12-04T08:57:06.6428392Z  * [new tag]                 viable/strict/1762923072    -> viable/strict/1762923072
2025-12-04T08:57:06.6429776Z  * [new tag]                 viable/strict/1762928826    -> viable/strict/1762928826
2025-12-04T08:57:06.6431345Z  * [new tag]                 viable/strict/1762930451    -> viable/strict/1762930451
2025-12-04T08:57:06.6432923Z  * [new tag]                 viable/strict/1762933780    -> viable/strict/1762933780
2025-12-04T08:57:06.6434177Z  * [new tag]                 viable/strict/1762937638    -> viable/strict/1762937638
2025-12-04T08:57:06.6435679Z  * [new tag]                 viable/strict/1762939545    -> viable/strict/1762939545
2025-12-04T08:57:06.6437075Z  * [new tag]                 viable/strict/1762962692    -> viable/strict/1762962692
2025-12-04T08:57:06.6438466Z  * [new tag]                 viable/strict/1762979143    -> viable/strict/1762979143
2025-12-04T08:57:06.6439879Z  * [new tag]                 viable/strict/1762984188    -> viable/strict/1762984188
2025-12-04T08:57:06.6441258Z  * [new tag]                 viable/strict/1762986306    -> viable/strict/1762986306
2025-12-04T08:57:06.6442640Z  * [new tag]                 viable/strict/1762989903    -> viable/strict/1762989903
2025-12-04T08:57:06.6444002Z  * [new tag]                 viable/strict/1762991377    -> viable/strict/1762991377
2025-12-04T08:57:06.6445361Z  * [new tag]                 viable/strict/1762998921    -> viable/strict/1762998921
2025-12-04T08:57:06.6446755Z  * [new tag]                 viable/strict/1763002287    -> viable/strict/1763002287
2025-12-04T08:57:06.6448194Z  * [new tag]                 viable/strict/1763016840    -> viable/strict/1763016840
2025-12-04T08:57:06.6449550Z  * [new tag]                 viable/strict/1763020180    -> viable/strict/1763020180
2025-12-04T08:57:06.6450926Z  * [new tag]                 viable/strict/1763027421    -> viable/strict/1763027421
2025-12-04T08:57:06.6452284Z  * [new tag]                 viable/strict/1763031120    -> viable/strict/1763031120
2025-12-04T08:57:06.6454071Z  * [new tag]                 viable/strict/1763036861    -> viable/strict/1763036861
2025-12-04T08:57:06.6455496Z  * [new tag]                 viable/strict/1763038993    -> viable/strict/1763038993
2025-12-04T08:57:06.6456886Z  * [new tag]                 viable/strict/1763054703    -> viable/strict/1763054703
2025-12-04T08:57:06.6458143Z  * [new tag]                 viable/strict/1763067061    -> viable/strict/1763067061
2025-12-04T08:57:06.6459525Z  * [new tag]                 viable/strict/1763070847    -> viable/strict/1763070847
2025-12-04T08:57:06.6460890Z  * [new tag]                 viable/strict/1763072706    -> viable/strict/1763072706
2025-12-04T08:57:06.6462330Z  * [new tag]                 viable/strict/1763076302    -> viable/strict/1763076302
2025-12-04T08:57:06.6463691Z  * [new tag]                 viable/strict/1763080816    -> viable/strict/1763080816
2025-12-04T08:57:06.6465166Z  * [new tag]                 viable/strict/1763082732    -> viable/strict/1763082732
2025-12-04T08:57:06.6466541Z  * [new tag]                 viable/strict/1763085329    -> viable/strict/1763085329
2025-12-04T08:57:06.6467929Z  * [new tag]                 viable/strict/1763088623    -> viable/strict/1763088623
2025-12-04T08:57:06.6469391Z  * [new tag]                 viable/strict/1763091402    -> viable/strict/1763091402
2025-12-04T08:57:06.6470801Z  * [new tag]                 viable/strict/1763092602    -> viable/strict/1763092602
2025-12-04T08:57:06.6472156Z  * [new tag]                 viable/strict/1763094355    -> viable/strict/1763094355
2025-12-04T08:57:06.6473513Z  * [new tag]                 viable/strict/1763099390    -> viable/strict/1763099390
2025-12-04T08:57:06.6474882Z  * [new tag]                 viable/strict/1763101608    -> viable/strict/1763101608
2025-12-04T08:57:06.6476333Z  * [new tag]                 viable/strict/1763105102    -> viable/strict/1763105102
2025-12-04T08:57:06.6477723Z  * [new tag]                 viable/strict/1763112347    -> viable/strict/1763112347
2025-12-04T08:57:06.6479076Z  * [new tag]                 viable/strict/1763119471    -> viable/strict/1763119471
2025-12-04T08:57:06.6480392Z  * [new tag]                 viable/strict/1763126835    -> viable/strict/1763126835
2025-12-04T08:57:06.6481670Z  * [new tag]                 viable/strict/1763149779    -> viable/strict/1763149779
2025-12-04T08:57:06.6483127Z  * [new tag]                 viable/strict/1763164178    -> viable/strict/1763164178
2025-12-04T08:57:06.6484393Z  * [new tag]                 viable/strict/1763167104    -> viable/strict/1763167104
2025-12-04T08:57:06.6485742Z  * [new tag]                 viable/strict/1763169132    -> viable/strict/1763169132
2025-12-04T08:57:06.6487135Z  * [new tag]                 viable/strict/1763171708    -> viable/strict/1763171708
2025-12-04T08:57:06.6488444Z  * [new tag]                 viable/strict/1763174759    -> viable/strict/1763174759
2025-12-04T08:57:06.6489812Z  * [new tag]                 viable/strict/1763180744    -> viable/strict/1763180744
2025-12-04T08:57:06.6491160Z  * [new tag]                 viable/strict/1763182227    -> viable/strict/1763182227
2025-12-04T08:57:06.6492551Z  * [new tag]                 viable/strict/1763184309    -> viable/strict/1763184309
2025-12-04T08:57:06.6494249Z  * [new tag]                 viable/strict/1763187991    -> viable/strict/1763187991
2025-12-04T08:57:06.6495652Z  * [new tag]                 viable/strict/1763191445    -> viable/strict/1763191445
2025-12-04T08:57:06.6497158Z  * [new tag]                 viable/strict/1763195152    -> viable/strict/1763195152
2025-12-04T08:57:06.6498522Z  * [new tag]                 viable/strict/1763205769    -> viable/strict/1763205769
2025-12-04T08:57:06.6499828Z  * [new tag]                 viable/strict/1763246990    -> viable/strict/1763246990
2025-12-04T08:57:06.6501225Z  * [new tag]                 viable/strict/1763261578    -> viable/strict/1763261578
2025-12-04T08:57:06.6502515Z  * [new tag]                 viable/strict/1763286573    -> viable/strict/1763286573
2025-12-04T08:57:06.6503802Z  * [new tag]                 viable/strict/1763292167    -> viable/strict/1763292167
2025-12-04T08:57:06.6505167Z  * [new tag]                 viable/strict/1763333386    -> viable/strict/1763333386
2025-12-04T08:57:06.6506515Z  * [new tag]                 viable/strict/1763340082    -> viable/strict/1763340082
2025-12-04T08:57:06.6508515Z  * [new tag]                 viable/strict/1763364324    -> viable/strict/1763364324
2025-12-04T08:57:06.6510004Z  * [new tag]                 viable/strict/1763371569    -> viable/strict/1763371569
2025-12-04T08:57:06.6511357Z  * [new tag]                 viable/strict/1763373067    -> viable/strict/1763373067
2025-12-04T08:57:06.6512675Z  * [new tag]                 viable/strict/1763375157    -> viable/strict/1763375157
2025-12-04T08:57:06.6514041Z  * [new tag]                 viable/strict/1763382462    -> viable/strict/1763382462
2025-12-04T08:57:06.6515492Z  * [new tag]                 viable/strict/1763394661    -> viable/strict/1763394661
2025-12-04T08:57:06.6516958Z  * [new tag]                 viable/strict/1763396797    -> viable/strict/1763396797
2025-12-04T08:57:06.6518633Z  * [new tag]                 viable/strict/1763398542    -> viable/strict/1763398542
2025-12-04T08:57:06.6520113Z  * [new tag]                 viable/strict/1763401807    -> viable/strict/1763401807
2025-12-04T08:57:06.6521389Z  * [new tag]                 viable/strict/1763414698    -> viable/strict/1763414698
2025-12-04T08:57:06.6522808Z  * [new tag]                 viable/strict/1763419807    -> viable/strict/1763419807
2025-12-04T08:57:06.6524177Z  * [new tag]                 viable/strict/1763426369    -> viable/strict/1763426369
2025-12-04T08:57:06.6525579Z  * [new tag]                 viable/strict/1763428331    -> viable/strict/1763428331
2025-12-04T08:57:06.6526980Z  * [new tag]                 viable/strict/1763430922    -> viable/strict/1763430922
2025-12-04T08:57:06.6528233Z  * [new tag]                 viable/strict/1763434184    -> viable/strict/1763434184
2025-12-04T08:57:06.6529612Z  * [new tag]                 viable/strict/1763439973    -> viable/strict/1763439973
2025-12-04T08:57:06.6530993Z  * [new tag]                 viable/strict/1763444995    -> viable/strict/1763444995
2025-12-04T08:57:06.6532499Z  * [new tag]                 viable/strict/1763447206    -> viable/strict/1763447206
2025-12-04T08:57:06.6533814Z  * [new tag]                 viable/strict/1763448826    -> viable/strict/1763448826
2025-12-04T08:57:06.6535135Z  * [new tag]                 viable/strict/1763450717    -> viable/strict/1763450717
2025-12-04T08:57:06.6536517Z  * [new tag]                 viable/strict/1763452183    -> viable/strict/1763452183
2025-12-04T08:57:06.6537902Z  * [new tag]                 viable/strict/1763457945    -> viable/strict/1763457945
2025-12-04T08:57:06.6539246Z  * [new tag]                 viable/strict/1763459439    -> viable/strict/1763459439
2025-12-04T08:57:06.6540509Z  * [new tag]                 viable/strict/1763461556    -> viable/strict/1763461556
2025-12-04T08:57:06.6541884Z  * [new tag]                 viable/strict/1763463103    -> viable/strict/1763463103
2025-12-04T08:57:06.6543650Z  * [new tag]                 viable/strict/1763465100    -> viable/strict/1763465100
2025-12-04T08:57:06.6544961Z  * [new tag]                 viable/strict/1763468866    -> viable/strict/1763468866
2025-12-04T08:57:06.6546212Z  * [new tag]                 viable/strict/1763493823    -> viable/strict/1763493823
2025-12-04T08:57:06.6547482Z  * [new tag]                 viable/strict/1763496249    -> viable/strict/1763496249
2025-12-04T08:57:06.6548931Z  * [new tag]                 viable/strict/1763502620    -> viable/strict/1763502620
2025-12-04T08:57:06.6550335Z  * [new tag]                 viable/strict/1763504715    -> viable/strict/1763504715
2025-12-04T08:57:06.6551696Z  * [new tag]                 viable/strict/1763506208    -> viable/strict/1763506208
2025-12-04T08:57:06.6553122Z  * [new tag]                 viable/strict/1763520590    -> viable/strict/1763520590
2025-12-04T08:57:06.6554597Z  * [new tag]                 viable/strict/1763523357    -> viable/strict/1763523357
2025-12-04T08:57:06.6556004Z  * [new tag]                 viable/strict/1763529922    -> viable/strict/1763529922
2025-12-04T08:57:06.6557422Z  * [new tag]                 viable/strict/1763531408    -> viable/strict/1763531408
2025-12-04T08:57:06.6558848Z  * [new tag]                 viable/strict/1763533622    -> viable/strict/1763533622
2025-12-04T08:57:06.6560271Z  * [new tag]                 viable/strict/1763538576    -> viable/strict/1763538576
2025-12-04T08:57:06.6561696Z  * [new tag]                 viable/strict/1763545823    -> viable/strict/1763545823
2025-12-04T08:57:06.6562850Z  * [new tag]                 viable/strict/1763547951    -> viable/strict/1763547951
2025-12-04T08:57:06.6564380Z  * [new tag]                 viable/strict/1763551477    -> viable/strict/1763551477
2025-12-04T08:57:06.6565735Z  * [new tag]                 viable/strict/1763552982    -> viable/strict/1763552982
2025-12-04T08:57:06.6567100Z  * [new tag]                 viable/strict/1763594698    -> viable/strict/1763594698
2025-12-04T08:57:06.6568499Z  * [new tag]                 viable/strict/1763596178    -> viable/strict/1763596178
2025-12-04T08:57:06.6569925Z  * [new tag]                 viable/strict/1763599155    -> viable/strict/1763599155
2025-12-04T08:57:06.6571243Z  * [new tag]                 viable/strict/1763603717    -> viable/strict/1763603717
2025-12-04T08:57:06.6572636Z  * [new tag]                 viable/strict/1763606923    -> viable/strict/1763606923
2025-12-04T08:57:06.6574001Z  * [new tag]                 viable/strict/1763609715    -> viable/strict/1763609715
2025-12-04T08:57:06.6575403Z  * [new tag]                 viable/strict/1763612757    -> viable/strict/1763612757
2025-12-04T08:57:06.6576741Z  * [new tag]                 viable/strict/1763616325    -> viable/strict/1763616325
2025-12-04T08:57:06.6578182Z  * [new tag]                 viable/strict/1763623509    -> viable/strict/1763623509
2025-12-04T08:57:06.6580106Z  * [new tag]                 viable/strict/1763624984    -> viable/strict/1763624984
2025-12-04T08:57:06.6580981Z  * [new tag]                 viable/strict/1763628796    -> viable/strict/1763628796
2025-12-04T08:57:06.6582484Z  * [new tag]                 viable/strict/1763634343    -> viable/strict/1763634343
2025-12-04T08:57:06.6583775Z  * [new tag]                 viable/strict/1763635867    -> viable/strict/1763635867
2025-12-04T08:57:06.6585406Z  * [new tag]                 viable/strict/1763639382    -> viable/strict/1763639382
2025-12-04T08:57:06.6586656Z  * [new tag]                 viable/strict/1763646626    -> viable/strict/1763646626
2025-12-04T08:57:06.6588130Z  * [new tag]                 viable/strict/1763655997    -> viable/strict/1763655997
2025-12-04T08:57:06.6589497Z  * [new tag]                 viable/strict/1763659444    -> viable/strict/1763659444
2025-12-04T08:57:06.6590898Z  * [new tag]                 viable/strict/1763660992    -> viable/strict/1763660992
2025-12-04T08:57:06.6592234Z  * [new tag]                 viable/strict/1763663201    -> viable/strict/1763663201
2025-12-04T08:57:06.6593609Z  * [new tag]                 viable/strict/1763670362    -> viable/strict/1763670362
2025-12-04T08:57:06.6594873Z  * [new tag]                 viable/strict/1763675378    -> viable/strict/1763675378
2025-12-04T08:57:06.6596294Z  * [new tag]                 viable/strict/1763693343    -> viable/strict/1763693343
2025-12-04T08:57:06.6597635Z  * [new tag]                 viable/strict/1763696088    -> viable/strict/1763696088
2025-12-04T08:57:06.6599128Z  * [new tag]                 viable/strict/1763697343    -> viable/strict/1763697343
2025-12-04T08:57:06.6600632Z  * [new tag]                 viable/strict/1763699165    -> viable/strict/1763699165
2025-12-04T08:57:06.6602049Z  * [new tag]                 viable/strict/1763700660    -> viable/strict/1763700660
2025-12-04T08:57:06.6603408Z  * [new tag]                 viable/strict/1763704209    -> viable/strict/1763704209
2025-12-04T08:57:06.6604778Z  * [new tag]                 viable/strict/1763706411    -> viable/strict/1763706411
2025-12-04T08:57:06.6606116Z  * [new tag]                 viable/strict/1763708082    -> viable/strict/1763708082
2025-12-04T08:57:06.6607446Z  * [new tag]                 viable/strict/1763711381    -> viable/strict/1763711381
2025-12-04T08:57:06.6608727Z  * [new tag]                 viable/strict/1763713593    -> viable/strict/1763713593
2025-12-04T08:57:06.6610120Z  * [new tag]                 viable/strict/1763715201    -> viable/strict/1763715201
2025-12-04T08:57:06.6611465Z  * [new tag]                 viable/strict/1763733017    -> viable/strict/1763733017
2025-12-04T08:57:06.6612860Z  * [new tag]                 viable/strict/1763735108    -> viable/strict/1763735108
2025-12-04T08:57:06.6614257Z  * [new tag]                 viable/strict/1763749579    -> viable/strict/1763749579
2025-12-04T08:57:06.6615620Z  * [new tag]                 viable/strict/1763751113    -> viable/strict/1763751113
2025-12-04T08:57:06.6617133Z  * [new tag]                 viable/strict/1763753035    -> viable/strict/1763753035
2025-12-04T08:57:06.6618586Z  * [new tag]                 viable/strict/1763754578    -> viable/strict/1763754578
2025-12-04T08:57:06.6620014Z  * [new tag]                 viable/strict/1763756748    -> viable/strict/1763756748
2025-12-04T08:57:06.6621396Z  * [new tag]                 viable/strict/1763758205    -> viable/strict/1763758205
2025-12-04T08:57:06.6622879Z  * [new tag]                 viable/strict/1763764050    -> viable/strict/1763764050
2025-12-04T08:57:06.6624374Z  * [new tag]                 viable/strict/1763771887    -> viable/strict/1763771887
2025-12-04T08:57:06.6625870Z  * [new tag]                 viable/strict/1763773920    -> viable/strict/1763773920
2025-12-04T08:57:06.6627231Z  * [new tag]                 viable/strict/1763776501    -> viable/strict/1763776501
2025-12-04T08:57:06.6628574Z  * [new tag]                 viable/strict/1763779437    -> viable/strict/1763779437
2025-12-04T08:57:06.6630109Z  * [new tag]                 viable/strict/1763781038    -> viable/strict/1763781038
2025-12-04T08:57:06.6631395Z  * [new tag]                 viable/strict/1763782245    -> viable/strict/1763782245
2025-12-04T08:57:06.6633410Z  * [new tag]                 viable/strict/1763785568    -> viable/strict/1763785568
2025-12-04T08:57:06.6634841Z  * [new tag]                 viable/strict/1763787006    -> viable/strict/1763787006
2025-12-04T08:57:06.6636230Z  * [new tag]                 viable/strict/1763789103    -> viable/strict/1763789103
2025-12-04T08:57:06.6637566Z  * [new tag]                 viable/strict/1763790578    -> viable/strict/1763790578
2025-12-04T08:57:06.6638905Z  * [new tag]                 viable/strict/1763796275    -> viable/strict/1763796275
2025-12-04T08:57:06.6640557Z  * [new tag]                 viable/strict/1763801465    -> viable/strict/1763801465
2025-12-04T08:57:06.6641993Z  * [new tag]                 viable/strict/1763803522    -> viable/strict/1763803522
2025-12-04T08:57:06.6643288Z  * [new tag]                 viable/strict/1763808581    -> viable/strict/1763808581
2025-12-04T08:57:06.6644740Z  * [new tag]                 viable/strict/1763840977    -> viable/strict/1763840977
2025-12-04T08:57:06.6646161Z  * [new tag]                 viable/strict/1763846659    -> viable/strict/1763846659
2025-12-04T08:57:06.6647598Z  * [new tag]                 viable/strict/1763872065    -> viable/strict/1763872065
2025-12-04T08:57:06.6648955Z  * [new tag]                 viable/strict/1763873648    -> viable/strict/1763873648
2025-12-04T08:57:06.6650345Z  * [new tag]                 viable/strict/1763875506    -> viable/strict/1763875506
2025-12-04T08:57:06.6651599Z  * [new tag]                 viable/strict/1763889904    -> viable/strict/1763889904
2025-12-04T08:57:06.6653025Z  * [new tag]                 viable/strict/1763930999    -> viable/strict/1763930999
2025-12-04T08:57:06.6654380Z  * [new tag]                 viable/strict/1763944964    -> viable/strict/1763944964
2025-12-04T08:57:06.6655525Z  * [new tag]                 viable/strict/1763958474    -> viable/strict/1763958474
2025-12-04T08:57:06.6656940Z  * [new tag]                 viable/strict/1763967263    -> viable/strict/1763967263
2025-12-04T08:57:06.6658381Z  * [new tag]                 viable/strict/1763972803    -> viable/strict/1763972803
2025-12-04T08:57:06.6659714Z  * [new tag]                 viable/strict/1763976376    -> viable/strict/1763976376
2025-12-04T08:57:06.6661103Z  * [new tag]                 viable/strict/1763989404    -> viable/strict/1763989404
2025-12-04T08:57:06.6662445Z  * [new tag]                 viable/strict/1763990887    -> viable/strict/1763990887
2025-12-04T08:57:06.6663865Z  * [new tag]                 viable/strict/1764019919    -> viable/strict/1764019919
2025-12-04T08:57:06.6665230Z  * [new tag]                 viable/strict/1764023134    -> viable/strict/1764023134
2025-12-04T08:57:06.6666466Z  * [new tag]                 viable/strict/1764024593    -> viable/strict/1764024593
2025-12-04T08:57:06.6667840Z  * [new tag]                 viable/strict/1764026706    -> viable/strict/1764026706
2025-12-04T08:57:06.6669335Z  * [new tag]                 viable/strict/1764031139    -> viable/strict/1764031139
2025-12-04T08:57:06.6670744Z  * [new tag]                 viable/strict/1764033131    -> viable/strict/1764033131
2025-12-04T08:57:06.6672076Z  * [new tag]                 viable/strict/1764035725    -> viable/strict/1764035725
2025-12-04T08:57:06.6673341Z  * [new tag]                 viable/strict/1764624265    -> viable/strict/1764624265
2025-12-04T08:57:06.6674498Z  * [new tag]                 viable/strict/1764631514    -> viable/strict/1764631514
2025-12-04T08:57:06.6675794Z  * [new tag]                 viable/strict/1764632987    -> viable/strict/1764632987
2025-12-04T08:57:06.6676943Z  * [new tag]                 viable/strict/1764636063    -> viable/strict/1764636063
2025-12-04T08:57:06.6678255Z  * [new tag]                 viable/strict/1764643975    -> viable/strict/1764643975
2025-12-04T08:57:06.6679366Z  * [new tag]                 viable/strict/1764646859    -> viable/strict/1764646859
2025-12-04T08:57:06.6680773Z  * [new tag]                 viable/strict/1764653120    -> viable/strict/1764653120
2025-12-04T08:57:06.6682150Z  * [new tag]                 viable/strict/1764654632    -> viable/strict/1764654632
2025-12-04T08:57:06.6683175Z  * [new tag]                 viable/strict/1764656821    -> viable/strict/1764656821
2025-12-04T08:57:06.6684514Z  * [new tag]                 viable/strict/1764658557    -> viable/strict/1764658557
2025-12-04T08:57:06.6685670Z  * [new tag]                 viable/strict/1764660333    -> viable/strict/1764660333
2025-12-04T08:57:06.6687035Z  * [new tag]                 viable/strict/1764661812    -> viable/strict/1764661812
2025-12-04T08:57:06.6688238Z  * [new tag]                 viable/strict/1764664023    -> viable/strict/1764664023
2025-12-04T08:57:06.6689485Z  * [new tag]                 viable/strict/1764669150    -> viable/strict/1764669150
2025-12-04T08:57:06.6690681Z  * [new tag]                 viable/strict/1764680709    -> viable/strict/1764680709
2025-12-04T08:57:06.6691929Z  * [new tag]                 viable/strict/1764687619    -> viable/strict/1764687619
2025-12-04T08:57:06.6693197Z  * [new tag]                 viable/strict/1764696355    -> viable/strict/1764696355
2025-12-04T08:57:06.6694388Z  * [new tag]                 viable/strict/1764701767    -> viable/strict/1764701767
2025-12-04T08:57:06.6695639Z  * [new tag]                 viable/strict/1764710768    -> viable/strict/1764710768
2025-12-04T08:57:06.6696800Z  * [new tag]                 viable/strict/1764716202    -> viable/strict/1764716202
2025-12-04T08:57:06.6698082Z  * [new tag]                 viable/strict/1764793566    -> viable/strict/1764793566
2025-12-04T08:57:06.6699228Z  * [new tag]                 viable/strict/1764797093    -> viable/strict/1764797093
2025-12-04T08:57:06.6700535Z  * [new tag]                 viable/strict/1764800729    -> viable/strict/1764800729
2025-12-04T08:57:06.6701851Z  * [new tag]                 whc_flight_1                -> whc_flight_1
2025-12-04T08:57:06.6703222Z  * [new tag]                 whc_flight_2                -> whc_flight_2
2025-12-04T08:57:06.6704648Z  * [new tag]                 whc_flight_4                -> whc_flight_4
2025-12-04T08:57:06.7549211Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object}
2025-12-04T08:57:06.7576907Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:57:06.7581561Z ##[endgroup]
2025-12-04T08:57:06.7581917Z ##[group]Determining the checkout info
2025-12-04T08:57:06.7582992Z ##[endgroup]
2025-12-04T08:57:06.7587478Z [command]/usr/bin/git sparse-checkout disable
2025-12-04T08:57:06.7624360Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig
2025-12-04T08:57:06.7650744Z ##[group]Checking out the ref
2025-12-04T08:57:06.7654711Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:57:07.8195896Z Updating files:  75% (15153/20121)
2025-12-04T08:57:07.8353066Z Updating files:  76% (15292/20121)
2025-12-04T08:57:07.8503864Z Updating files:  77% (15494/20121)
2025-12-04T08:57:07.8714854Z Updating files:  78% (15695/20121)
2025-12-04T08:57:07.8973352Z Updating files:  79% (15896/20121)
2025-12-04T08:57:07.9280649Z Updating files:  80% (16097/20121)
2025-12-04T08:57:07.9560573Z Updating files:  81% (16299/20121)
2025-12-04T08:57:07.9783983Z Updating files:  82% (16500/20121)
2025-12-04T08:57:07.9952536Z Updating files:  83% (16701/20121)
2025-12-04T08:57:08.0107869Z Updating files:  84% (16902/20121)
2025-12-04T08:57:08.0284937Z Updating files:  85% (17103/20121)
2025-12-04T08:57:08.0456387Z Updating files:  86% (17305/20121)
2025-12-04T08:57:08.0615676Z Updating files:  87% (17506/20121)
2025-12-04T08:57:08.0748301Z Updating files:  88% (17707/20121)
2025-12-04T08:57:08.0900816Z Updating files:  89% (17908/20121)
2025-12-04T08:57:08.1085251Z Updating files:  90% (18109/20121)
2025-12-04T08:57:08.1220953Z Updating files:  91% (18311/20121)
2025-12-04T08:57:08.1388361Z Updating files:  92% (18512/20121)
2025-12-04T08:57:08.1581249Z Updating files:  93% (18713/20121)
2025-12-04T08:57:08.1790415Z Updating files:  94% (18914/20121)
2025-12-04T08:57:08.1975645Z Updating files:  95% (19115/20121)
2025-12-04T08:57:08.2150258Z Updating files:  96% (19317/20121)
2025-12-04T08:57:08.2325808Z Updating files:  97% (19518/20121)
2025-12-04T08:57:08.2604839Z Updating files:  98% (19719/20121)
2025-12-04T08:57:08.2790512Z Updating files:  99% (19920/20121)
2025-12-04T08:57:08.2790825Z Updating files: 100% (20121/20121)
2025-12-04T08:57:08.2791100Z Updating files: 100% (20121/20121), done.
2025-12-04T08:57:08.3023309Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'.
2025-12-04T08:57:08.3023625Z 
2025-12-04T08:57:08.3023839Z You are in 'detached HEAD' state. You can look around, make experimental
2025-12-04T08:57:08.3024347Z changes and commit them, and you can discard any commits you make in this
2025-12-04T08:57:08.3024842Z state without impacting any branches by switching back to a branch.
2025-12-04T08:57:08.3025126Z 
2025-12-04T08:57:08.3025317Z If you want to create a new branch to retain commits you create, you may
2025-12-04T08:57:08.3025800Z do so (now or later) by using -c with the switch command. Example:
2025-12-04T08:57:08.3026067Z 
2025-12-04T08:57:08.3026188Z   git switch -c <new-branch-name>
2025-12-04T08:57:08.3026376Z 
2025-12-04T08:57:08.3026475Z Or undo this operation with:
2025-12-04T08:57:08.3026637Z 
2025-12-04T08:57:08.3026725Z   git switch -
2025-12-04T08:57:08.3026848Z 
2025-12-04T08:57:08.3027061Z Turn off this advice by setting config variable advice.detachedHead to false
2025-12-04T08:57:08.3027374Z 
2025-12-04T08:57:08.3027615Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T08:57:08.3148174Z ##[endgroup]
2025-12-04T08:57:08.3148852Z ##[group]Setting up auth for fetching submodules
2025-12-04T08:57:08.3155388Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T08:57:08.3208802Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2025-12-04T08:57:08.3241783Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2025-12-04T08:57:08.3270525Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2025-12-04T08:57:08.3295695Z ##[endgroup]
2025-12-04T08:57:08.3296513Z ##[group]Fetching submodules
2025-12-04T08:57:08.3300754Z [command]/usr/bin/git submodule sync --recursive
2025-12-04T08:57:08.3643921Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2025-12-04T08:57:08.3975077Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni'
2025-12-04T08:57:08.3977567Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16'
2025-12-04T08:57:08.3980670Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv'
2025-12-04T08:57:08.3983974Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
2025-12-04T08:57:08.3987295Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX'
2025-12-04T08:57:08.3991045Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator'
2025-12-04T08:57:08.3994436Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK'
2025-12-04T08:57:08.3998073Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter'
2025-12-04T08:57:08.4002148Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark'
2025-12-04T08:57:08.4006101Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel'
2025-12-04T08:57:08.4010052Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib'
2025-12-04T08:57:08.4013791Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo'
2025-12-04T08:57:08.4018449Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend'
2025-12-04T08:57:08.4022540Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass'
2025-12-04T08:57:08.4026709Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm'
2025-12-04T08:57:08.4031894Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention'
2025-12-04T08:57:08.4039094Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers'
2025-12-04T08:57:08.4043670Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt'
2025-12-04T08:57:08.4048258Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:57:08.4052645Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo'
2025-12-04T08:57:08.4057363Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
2025-12-04T08:57:08.4061877Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep'
2025-12-04T08:57:08.4066694Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi'
2025-12-04T08:57:08.4071487Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto'
2025-12-04T08:57:08.4076509Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai'
2025-12-04T08:57:08.4081582Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc'
2025-12-04T08:57:08.4086582Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann'
2025-12-04T08:57:08.4091668Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
2025-12-04T08:57:08.4097122Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp'
2025-12-04T08:57:08.4102429Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft'
2025-12-04T08:57:08.4107840Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf'
2025-12-04T08:57:08.4113219Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd'
2025-12-04T08:57:08.4119279Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool'
2025-12-04T08:57:08.4127989Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11'
2025-12-04T08:57:08.4133975Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy'
2025-12-04T08:57:08.4139466Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef'
2025-12-04T08:57:08.4145408Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe'
2025-12-04T08:57:08.4176817Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'...
2025-12-04T08:57:08.6386845Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'...
2025-12-04T08:57:08.6387751Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'...
2025-12-04T08:57:08.6388366Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'...
2025-12-04T08:57:08.6414886Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'...
2025-12-04T08:57:11.2196480Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'...
2025-12-04T08:57:11.2198521Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'...
2025-12-04T08:57:11.2200673Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'...
2025-12-04T08:57:11.2201882Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'...
2025-12-04T08:57:11.2204864Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'...
2025-12-04T08:57:11.2206096Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'...
2025-12-04T08:57:11.2329938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'...
2025-12-04T08:57:11.2331019Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'...
2025-12-04T08:57:11.2332829Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'...
2025-12-04T08:57:11.2334518Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'...
2025-12-04T08:57:11.2336416Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'...
2025-12-04T08:57:11.2338271Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'...
2025-12-04T08:57:11.2339316Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'...
2025-12-04T08:57:11.2340953Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'...
2025-12-04T08:57:11.2342838Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'...
2025-12-04T08:57:11.2343868Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'...
2025-12-04T08:57:11.2345727Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'...
2025-12-04T08:57:11.3531369Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'...
2025-12-04T08:57:11.5907107Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'...
2025-12-04T08:57:11.5908215Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'...
2025-12-04T08:57:11.6908412Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'...
2025-12-04T08:57:13.7246773Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'...
2025-12-04T08:57:13.7248124Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'...
2025-12-04T08:57:13.7249202Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'...
2025-12-04T08:57:13.7250246Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'...
2025-12-04T08:57:13.7251324Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'...
2025-12-04T08:57:13.8247710Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'...
2025-12-04T08:57:29.7475729Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'...
2025-12-04T08:57:29.7476363Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'...
2025-12-04T08:57:29.7476926Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'...
2025-12-04T08:57:29.7477736Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'...
2025-12-04T08:57:29.7478442Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'...
2025-12-04T08:57:29.7666550Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2025-12-04T08:57:29.7821442Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2025-12-04T08:57:29.7935380Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2025-12-04T08:57:29.8250298Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2025-12-04T08:57:29.9149981Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6'
2025-12-04T08:57:29.9677818Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1'
2025-12-04T08:57:30.8528151Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883'
2025-12-04T08:57:31.0470354Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150'
2025-12-04T08:57:31.0493404Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:57:31.0523917Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'...
2025-12-04T08:57:35.3442309Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf'
2025-12-04T08:57:35.3718480Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f'
2025-12-04T08:57:35.7847957Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T08:57:35.8361872Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246'
2025-12-04T08:57:35.9364607Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc'
2025-12-04T08:57:35.9878373Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396'
2025-12-04T08:57:36.7184909Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588'
2025-12-04T08:57:36.8936918Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4'
2025-12-04T08:57:36.8970392Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit'
2025-12-04T08:57:36.8972992Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:57:36.8976417Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:57:36.8980009Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass'
2025-12-04T08:57:36.8983646Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest'
2025-12-04T08:57:36.8987370Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:57:36.8991033Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json'
2025-12-04T08:57:36.9023460Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'...
2025-12-04T08:57:38.0163939Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'...
2025-12-04T08:57:38.0165389Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'...
2025-12-04T08:57:38.0166518Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'...
2025-12-04T08:57:38.1165070Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'...
2025-12-04T08:57:41.2468646Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'...
2025-12-04T08:57:41.3470433Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'...
2025-12-04T08:57:43.2628130Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea'
2025-12-04T08:57:43.6654032Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T08:57:43.7674460Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349'
2025-12-04T08:57:44.4624001Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8'
2025-12-04T08:57:44.5109937Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:57:44.5254529Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691'
2025-12-04T08:57:44.6383400Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03'
2025-12-04T08:57:44.7205750Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5'
2025-12-04T08:57:44.7228038Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:57:44.7231047Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:57:44.7259623Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'...
2025-12-04T08:57:48.7535150Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'...
2025-12-04T08:57:49.0442030Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33'
2025-12-04T08:57:49.6617859Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420'
2025-12-04T08:57:49.8204982Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757'
2025-12-04T08:57:49.8527581Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f'
2025-12-04T08:57:49.8959766Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2025-12-04T08:57:49.9271154Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341'
2025-12-04T08:57:49.9740441Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:57:49.9896043Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3'
2025-12-04T08:57:49.9914719Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn'
2025-12-04T08:57:49.9943791Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'...
2025-12-04T08:58:03.9759633Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d'
2025-12-04T08:58:03.9994932Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959'
2025-12-04T08:58:04.0853178Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943'
2025-12-04T08:58:04.0882322Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:04.0885109Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:04.0888540Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:04.0920020Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'...
2025-12-04T08:58:04.7102217Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'...
2025-12-04T08:58:05.1978087Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'...
2025-12-04T08:58:05.2904193Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
2025-12-04T08:58:05.2926934Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:05.2929841Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:05.2933208Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:05.2936654Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:05.2940240Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:05.2944092Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:05.2948601Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:05.2952570Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:05.2956733Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:05.2989679Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'...
2025-12-04T08:58:07.1853228Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'...
2025-12-04T08:58:07.1854771Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'...
2025-12-04T08:58:07.1855933Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'...
2025-12-04T08:58:07.1856962Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'...
2025-12-04T08:58:07.1857941Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'...
2025-12-04T08:58:07.1858919Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'...
2025-12-04T08:58:07.1859926Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'...
2025-12-04T08:58:07.2853644Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'...
2025-12-04T08:58:11.7170787Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9'
2025-12-04T08:58:11.7394747Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400'
2025-12-04T08:58:11.7790463Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
2025-12-04T08:58:11.7958600Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067'
2025-12-04T08:58:11.7978124Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:11.8012172Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'...
2025-12-04T08:58:12.0499428Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4'
2025-12-04T08:58:12.0720249Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446'
2025-12-04T08:58:12.1190978Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:58:12.2280709Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5'
2025-12-04T08:58:12.2476214Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150'
2025-12-04T08:58:12.2687080Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
2025-12-04T08:58:12.2707064Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:12.2710663Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:12.2741544Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T08:58:14.2156198Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T08:58:14.4620901Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
2025-12-04T08:58:14.5122765Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T08:58:14.5462576Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
2025-12-04T08:58:14.5934421Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T08:58:14.6543064Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe'
2025-12-04T08:58:14.6969839Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e'
2025-12-04T08:58:14.8232150Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72'
2025-12-04T08:58:15.3579044Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83'
2025-12-04T08:58:15.3616528Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:15.3647360Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'...
2025-12-04T08:58:16.0820273Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4'
2025-12-04T08:58:16.1662810Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878'
2025-12-04T08:58:16.1684089Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:16.1687352Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:16.1690720Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:16.1694258Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:16.1697997Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:16.1701583Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:16.1705252Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:16.1708901Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:16.1740867Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'...
2025-12-04T08:58:16.5626403Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'...
2025-12-04T08:58:16.5627342Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'...
2025-12-04T08:58:16.5628201Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'...
2025-12-04T08:58:16.5628979Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'...
2025-12-04T08:58:16.6627673Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'...
2025-12-04T08:58:17.1708537Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'...
2025-12-04T08:58:23.2784993Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'...
2025-12-04T08:58:23.8520494Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2'
2025-12-04T08:58:23.8955203Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1'
2025-12-04T08:58:23.9139834Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa'
2025-12-04T08:58:24.0300665Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'
2025-12-04T08:58:24.0461878Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce'
2025-12-04T08:58:24.0642298Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5'
2025-12-04T08:58:24.0838534Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d'
2025-12-04T08:58:24.0865543Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:24.0868860Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:24.0898479Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T08:58:26.0557327Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T08:58:26.3024922Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4'
2025-12-04T08:58:26.3516362Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T08:58:26.9902498Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50'
2025-12-04T08:58:27.0047191Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa'
2025-12-04T08:58:27.2967524Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2025-12-04T08:58:27.2992991Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:58:27.2996313Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest'
2025-12-04T08:58:27.3026784Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'...
2025-12-04T08:58:27.8021981Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'...
2025-12-04T08:58:28.0774924Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2025-12-04T08:58:28.1498878Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2025-12-04T08:58:28.1624039Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2025-12-04T08:58:28.1767419Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8'
2025-12-04T08:58:28.2265034Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8'
2025-12-04T08:58:28.2577705Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2025-12-04T08:58:28.3036668Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68'
2025-12-04T08:58:28.3401144Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d'
2025-12-04T08:58:28.3421632Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:58:28.3424890Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:58:28.3428378Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:58:28.3432007Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:58:28.3463424Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'...
2025-12-04T08:58:29.2932598Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'...
2025-12-04T08:58:29.2933499Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'...
2025-12-04T08:58:29.3015033Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'...
2025-12-04T08:58:29.3591326Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2025-12-04T08:58:29.3781636Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2025-12-04T08:58:29.4544013Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b'
2025-12-04T08:58:29.4867804Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2025-12-04T08:58:29.4886392Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:58:29.4915890Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'...
2025-12-04T08:58:29.6639431Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2025-12-04T08:58:29.6693220Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2025-12-04T08:58:29.7037457Z Entering 'android/libs/fbjni'
2025-12-04T08:58:29.7084320Z Entering 'third_party/FP16'
2025-12-04T08:58:29.7131687Z Entering 'third_party/FXdiv'
2025-12-04T08:58:29.7179821Z Entering 'third_party/NNPACK'
2025-12-04T08:58:29.7234578Z Entering 'third_party/NVTX'
2025-12-04T08:58:29.7282524Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:58:29.7331226Z Entering 'third_party/XNNPACK'
2025-12-04T08:58:29.7392914Z Entering 'third_party/aiter'
2025-12-04T08:58:29.7441161Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:58:29.7504836Z Entering 'third_party/benchmark'
2025-12-04T08:58:29.7557777Z Entering 'third_party/composable_kernel'
2025-12-04T08:58:29.7620742Z Entering 'third_party/cpp-httplib'
2025-12-04T08:58:29.7668076Z Entering 'third_party/cpuinfo'
2025-12-04T08:58:29.7714593Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:58:29.7764458Z Entering 'third_party/cutlass'
2025-12-04T08:58:29.7820991Z Entering 'third_party/fbgemm'
2025-12-04T08:58:29.7870914Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:58:29.7919350Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:58:29.7975149Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:58:29.8022412Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:58:29.8077995Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:58:29.8129645Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:58:29.8177266Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:58:29.8235550Z Entering 'third_party/flash-attention'
2025-12-04T08:58:29.8281737Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:58:29.8336069Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:58:29.8392544Z Entering 'third_party/flatbuffers'
2025-12-04T08:58:29.8446062Z Entering 'third_party/fmt'
2025-12-04T08:58:29.8492913Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:58:29.8542099Z Entering 'third_party/gloo'
2025-12-04T08:58:29.8590512Z Entering 'third_party/googletest'
2025-12-04T08:58:29.8641160Z Entering 'third_party/ideep'
2025-12-04T08:58:29.8687699Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:58:29.8741236Z Entering 'third_party/ittapi'
2025-12-04T08:58:29.8791897Z Entering 'third_party/kineto'
2025-12-04T08:58:29.8838107Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:29.8885808Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:29.8935079Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:29.8981569Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:29.9030697Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:29.9075997Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:29.9126427Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:29.9173275Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:29.9221363Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:29.9268676Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:29.9314057Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:29.9360324Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:29.9410847Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:29.9468743Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:29.9520387Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:29.9571353Z Entering 'third_party/kleidiai'
2025-12-04T08:58:29.9620479Z Entering 'third_party/mimalloc'
2025-12-04T08:58:29.9667041Z Entering 'third_party/nlohmann'
2025-12-04T08:58:29.9715210Z Entering 'third_party/onnx'
2025-12-04T08:58:29.9777887Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:29.9828613Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:58:29.9877331Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:29.9924404Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:29.9971389Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:30.0019256Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:30.0065400Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:30.0111338Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:30.0162542Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:30.0208836Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:30.0259663Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:30.0310973Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:30.0378679Z Entering 'third_party/pocketfft'
2025-12-04T08:58:30.0428324Z Entering 'third_party/protobuf'
2025-12-04T08:58:30.0484966Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:58:30.0532109Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:58:30.0582849Z Entering 'third_party/psimd'
2025-12-04T08:58:30.0631895Z Entering 'third_party/pthreadpool'
2025-12-04T08:58:30.0680732Z Entering 'third_party/pybind11'
2025-12-04T08:58:30.0734666Z Entering 'third_party/python-peachpy'
2025-12-04T08:58:30.0781691Z Entering 'third_party/sleef'
2025-12-04T08:58:30.0830567Z Entering 'third_party/tensorpipe'
2025-12-04T08:58:30.0882802Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:58:30.0930370Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:58:30.0977994Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:58:30.1023468Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:58:30.1068873Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:58:30.1134779Z ##[endgroup]
2025-12-04T08:58:30.1135214Z ##[group]Persisting credentials for submodules
2025-12-04T08:58:30.1140286Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :"
2025-12-04T08:58:30.1465995Z Entering 'android/libs/fbjni'
2025-12-04T08:58:30.1530465Z Entering 'third_party/FP16'
2025-12-04T08:58:30.1593347Z Entering 'third_party/FXdiv'
2025-12-04T08:58:30.1658559Z Entering 'third_party/NNPACK'
2025-12-04T08:58:30.1723644Z Entering 'third_party/NVTX'
2025-12-04T08:58:30.1790819Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:58:30.1853941Z Entering 'third_party/XNNPACK'
2025-12-04T08:58:30.1932061Z Entering 'third_party/aiter'
2025-12-04T08:58:30.2001060Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:58:30.2078115Z Entering 'third_party/benchmark'
2025-12-04T08:58:30.2140744Z Entering 'third_party/composable_kernel'
2025-12-04T08:58:30.2210520Z Entering 'third_party/cpp-httplib'
2025-12-04T08:58:30.2272281Z Entering 'third_party/cpuinfo'
2025-12-04T08:58:30.2336396Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:58:30.2399742Z Entering 'third_party/cutlass'
2025-12-04T08:58:30.2472541Z Entering 'third_party/fbgemm'
2025-12-04T08:58:30.2536794Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:58:30.2598986Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:58:30.2670437Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:58:30.2736186Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:58:30.2809370Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:58:30.2871707Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:58:30.2942025Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:58:30.3015548Z Entering 'third_party/flash-attention'
2025-12-04T08:58:30.3084669Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:58:30.3153691Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:58:30.3233965Z Entering 'third_party/flatbuffers'
2025-12-04T08:58:30.3303528Z Entering 'third_party/fmt'
2025-12-04T08:58:30.3364092Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:58:30.3428845Z Entering 'third_party/gloo'
2025-12-04T08:58:30.3492337Z Entering 'third_party/googletest'
2025-12-04T08:58:30.3566080Z Entering 'third_party/ideep'
2025-12-04T08:58:30.3628463Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:58:30.3699704Z Entering 'third_party/ittapi'
2025-12-04T08:58:30.3771567Z Entering 'third_party/kineto'
2025-12-04T08:58:30.3834149Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:30.3902247Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:30.3976173Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:30.4040586Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:30.4109778Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:30.4171504Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:30.4244319Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:30.4310599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:30.4380023Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:30.4444346Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:30.4509219Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:30.4571550Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:30.4637319Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:30.4711139Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:30.4783062Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:30.4857028Z Entering 'third_party/kleidiai'
2025-12-04T08:58:30.4922575Z Entering 'third_party/mimalloc'
2025-12-04T08:58:30.4985797Z Entering 'third_party/nlohmann'
2025-12-04T08:58:30.5051160Z Entering 'third_party/onnx'
2025-12-04T08:58:30.5129067Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:30.5206359Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:58:30.5270358Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:30.5332625Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:30.5398745Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:30.5460692Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:30.5532895Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:30.5603611Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:30.5669246Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:30.5738681Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:30.5809664Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:30.5875386Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:30.5957456Z Entering 'third_party/pocketfft'
2025-12-04T08:58:30.6034273Z Entering 'third_party/protobuf'
2025-12-04T08:58:30.6107667Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:58:30.6177611Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:58:30.6245493Z Entering 'third_party/psimd'
2025-12-04T08:58:30.6312729Z Entering 'third_party/pthreadpool'
2025-12-04T08:58:30.6382448Z Entering 'third_party/pybind11'
2025-12-04T08:58:30.6448625Z Entering 'third_party/python-peachpy'
2025-12-04T08:58:30.6512929Z Entering 'third_party/sleef'
2025-12-04T08:58:30.6577902Z Entering 'third_party/tensorpipe'
2025-12-04T08:58:30.6641273Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:58:30.6702981Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:58:30.6774232Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:58:30.6835980Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:58:30.6897608Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:58:30.6987725Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url"
2025-12-04T08:58:30.7322174Z Entering 'android/libs/fbjni'
2025-12-04T08:58:30.7386944Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T08:58:30.7401952Z Entering 'third_party/FP16'
2025-12-04T08:58:30.7460467Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T08:58:30.7480783Z Entering 'third_party/FXdiv'
2025-12-04T08:58:30.7540553Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T08:58:30.7560728Z Entering 'third_party/NNPACK'
2025-12-04T08:58:30.7619498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T08:58:30.7640409Z Entering 'third_party/NVTX'
2025-12-04T08:58:30.7703305Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T08:58:30.7724658Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:58:30.7783400Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T08:58:30.7803291Z Entering 'third_party/XNNPACK'
2025-12-04T08:58:30.7862081Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T08:58:30.7895726Z Entering 'third_party/aiter'
2025-12-04T08:58:30.7958797Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T08:58:30.7978998Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:58:30.8037022Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T08:58:30.8066263Z Entering 'third_party/benchmark'
2025-12-04T08:58:30.8123708Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T08:58:30.8143637Z Entering 'third_party/composable_kernel'
2025-12-04T08:58:30.8202483Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T08:58:30.8230826Z Entering 'third_party/cpp-httplib'
2025-12-04T08:58:30.8295739Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T08:58:30.8321123Z Entering 'third_party/cpuinfo'
2025-12-04T08:58:30.8383157Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T08:58:30.8403781Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:58:30.8460555Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T08:58:30.8481064Z Entering 'third_party/cutlass'
2025-12-04T08:58:30.8544224Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T08:58:30.8572078Z Entering 'third_party/fbgemm'
2025-12-04T08:58:30.8629310Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T08:58:30.8650838Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:58:30.8708865Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T08:58:30.8727413Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:58:30.8784326Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T08:58:30.8810489Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:58:30.8869490Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T08:58:30.8889652Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:58:30.8947526Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T08:58:30.8975237Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:58:30.9032794Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T08:58:30.9051918Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:58:30.9111903Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T08:58:30.9131445Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:58:30.9202009Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T08:58:30.9225555Z Entering 'third_party/flash-attention'
2025-12-04T08:58:30.9283255Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T08:58:30.9302194Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:58:30.9372423Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T08:58:30.9397195Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:58:30.9462473Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T08:58:30.9491035Z Entering 'third_party/flatbuffers'
2025-12-04T08:58:30.9549396Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T08:58:30.9571964Z Entering 'third_party/fmt'
2025-12-04T08:58:30.9631502Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T08:58:30.9651565Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:58:30.9711301Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T08:58:30.9731718Z Entering 'third_party/gloo'
2025-12-04T08:58:30.9790797Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T08:58:30.9811040Z Entering 'third_party/googletest'
2025-12-04T08:58:30.9870715Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:58:30.9890803Z Entering 'third_party/ideep'
2025-12-04T08:58:30.9956276Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T08:58:30.9978697Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:58:31.0047054Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T08:58:31.0074541Z Entering 'third_party/ittapi'
2025-12-04T08:58:31.0137409Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T08:58:31.0157905Z Entering 'third_party/kineto'
2025-12-04T08:58:31.0217965Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T08:58:31.0237864Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:31.0298723Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T08:58:31.0318048Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:31.0378742Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T08:58:31.0400582Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:31.0458903Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T08:58:31.0479394Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:31.0540276Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T08:58:31.0560480Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:31.0620748Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T08:58:31.0639155Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:31.0701198Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T08:58:31.0724600Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:31.0783431Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T08:58:31.0803175Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:31.0863931Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:58:31.0883617Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:31.0943486Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T08:58:31.0963875Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:31.1023363Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T08:58:31.1041783Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:31.1107679Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T08:58:31.1125392Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:31.1184524Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T08:58:31.1206249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:31.1265399Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T08:58:31.1289766Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:31.1349941Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T08:58:31.1369917Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:31.1429144Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T08:58:31.1458683Z Entering 'third_party/kleidiai'
2025-12-04T08:58:31.1518375Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T08:58:31.1539326Z Entering 'third_party/mimalloc'
2025-12-04T08:58:31.1597756Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T08:58:31.1618742Z Entering 'third_party/nlohmann'
2025-12-04T08:58:31.1678225Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T08:58:31.1699849Z Entering 'third_party/onnx'
2025-12-04T08:58:31.1768995Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T08:58:31.1807841Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:31.1864942Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T08:58:31.1887891Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:58:31.1947197Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T08:58:31.1971614Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:31.2029443Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T08:58:31.2049445Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:31.2106807Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:58:31.2126757Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:31.2183903Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T08:58:31.2202809Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:31.2264767Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T08:58:31.2284745Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:31.2343475Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T08:58:31.2362290Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:31.2425370Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T08:58:31.2443986Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:31.2502462Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T08:58:31.2520415Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:31.2579161Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T08:58:31.2606245Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:31.2664290Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T08:58:31.2685903Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:31.2742934Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T08:58:31.2780469Z Entering 'third_party/pocketfft'
2025-12-04T08:58:31.2842838Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T08:58:31.2862165Z Entering 'third_party/protobuf'
2025-12-04T08:58:31.2922108Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T08:58:31.2943618Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:58:31.3001961Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T08:58:31.3022126Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:58:31.3088177Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:58:31.3110890Z Entering 'third_party/psimd'
2025-12-04T08:58:31.3176109Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T08:58:31.3198106Z Entering 'third_party/pthreadpool'
2025-12-04T08:58:31.3256165Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T08:58:31.3276984Z Entering 'third_party/pybind11'
2025-12-04T08:58:31.3334589Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T08:58:31.3354049Z Entering 'third_party/python-peachpy'
2025-12-04T08:58:31.3411466Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T08:58:31.3431788Z Entering 'third_party/sleef'
2025-12-04T08:58:31.3489741Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T08:58:31.3510069Z Entering 'third_party/tensorpipe'
2025-12-04T08:58:31.3577397Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T08:58:31.3596603Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:58:31.3661375Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T08:58:31.3680847Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:58:31.3748499Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T08:58:31.3768321Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:58:31.3824334Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T08:58:31.3843475Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:58:31.3900889Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T08:58:31.3919196Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:58:31.3978989Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T08:58:31.5029027Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2025-12-04T08:58:31.5354589Z Entering 'android/libs/fbjni'
2025-12-04T08:58:31.5402390Z Entering 'third_party/FP16'
2025-12-04T08:58:31.5450166Z Entering 'third_party/FXdiv'
2025-12-04T08:58:31.5499148Z Entering 'third_party/NNPACK'
2025-12-04T08:58:31.5548708Z Entering 'third_party/NVTX'
2025-12-04T08:58:31.5595146Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:58:31.5643115Z Entering 'third_party/XNNPACK'
2025-12-04T08:58:31.5704506Z Entering 'third_party/aiter'
2025-12-04T08:58:31.5751999Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:58:31.5808444Z Entering 'third_party/benchmark'
2025-12-04T08:58:31.5857069Z Entering 'third_party/composable_kernel'
2025-12-04T08:58:31.5910534Z Entering 'third_party/cpp-httplib'
2025-12-04T08:58:31.5959189Z Entering 'third_party/cpuinfo'
2025-12-04T08:58:31.6008626Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:58:31.6057111Z Entering 'third_party/cutlass'
2025-12-04T08:58:31.6111035Z Entering 'third_party/fbgemm'
2025-12-04T08:58:31.6163757Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:58:31.6211138Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:58:31.6263692Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:58:31.6311833Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:58:31.6370782Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:58:31.6419485Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:58:31.6464604Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:58:31.6515387Z Entering 'third_party/flash-attention'
2025-12-04T08:58:31.6562735Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:58:31.6616110Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:58:31.6675513Z Entering 'third_party/flatbuffers'
2025-12-04T08:58:31.6734997Z Entering 'third_party/fmt'
2025-12-04T08:58:31.6782784Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:58:31.6830408Z Entering 'third_party/gloo'
2025-12-04T08:58:31.6880698Z Entering 'third_party/googletest'
2025-12-04T08:58:31.6934039Z Entering 'third_party/ideep'
2025-12-04T08:58:31.6980169Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:58:31.7037500Z Entering 'third_party/ittapi'
2025-12-04T08:58:31.7084048Z Entering 'third_party/kineto'
2025-12-04T08:58:31.7131655Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:31.7178764Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:31.7230870Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:31.7280202Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:31.7337432Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:31.7382151Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:31.7434045Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:31.7481838Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:31.7531507Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:31.7580817Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:31.7633599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:31.7680260Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:31.7733906Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:31.7788455Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:31.7840397Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:31.7899136Z Entering 'third_party/kleidiai'
2025-12-04T08:58:31.7950030Z Entering 'third_party/mimalloc'
2025-12-04T08:58:31.7999637Z Entering 'third_party/nlohmann'
2025-12-04T08:58:31.8050910Z Entering 'third_party/onnx'
2025-12-04T08:58:31.8113668Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:31.8168721Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:58:31.8222486Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:31.8270255Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:31.8324637Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:31.8371163Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:31.8424127Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:31.8470842Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:31.8518836Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:31.8562849Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:31.8611909Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:31.8661159Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:31.8730548Z Entering 'third_party/pocketfft'
2025-12-04T08:58:31.8779380Z Entering 'third_party/protobuf'
2025-12-04T08:58:31.8831031Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:58:31.8878906Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:58:31.8935703Z Entering 'third_party/psimd'
2025-12-04T08:58:31.8982635Z Entering 'third_party/pthreadpool'
2025-12-04T08:58:31.9031458Z Entering 'third_party/pybind11'
2025-12-04T08:58:31.9088206Z Entering 'third_party/python-peachpy'
2025-12-04T08:58:31.9138721Z Entering 'third_party/sleef'
2025-12-04T08:58:31.9188939Z Entering 'third_party/tensorpipe'
2025-12-04T08:58:31.9237856Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:58:31.9283642Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:58:31.9330914Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:58:31.9377639Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:58:31.9422905Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:58:31.9492918Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2025-12-04T08:58:31.9834562Z Entering 'android/libs/fbjni'
2025-12-04T08:58:31.9881915Z Entering 'third_party/FP16'
2025-12-04T08:58:31.9931724Z Entering 'third_party/FXdiv'
2025-12-04T08:58:31.9980009Z Entering 'third_party/NNPACK'
2025-12-04T08:58:32.0035658Z Entering 'third_party/NVTX'
2025-12-04T08:58:32.0084073Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:58:32.0131527Z Entering 'third_party/XNNPACK'
2025-12-04T08:58:32.0193727Z Entering 'third_party/aiter'
2025-12-04T08:58:32.0240264Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:58:32.0298221Z Entering 'third_party/benchmark'
2025-12-04T08:58:32.0345511Z Entering 'third_party/composable_kernel'
2025-12-04T08:58:32.0400577Z Entering 'third_party/cpp-httplib'
2025-12-04T08:58:32.0457942Z Entering 'third_party/cpuinfo'
2025-12-04T08:58:32.0510890Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:58:32.0572102Z Entering 'third_party/cutlass'
2025-12-04T08:58:32.0629905Z Entering 'third_party/fbgemm'
2025-12-04T08:58:32.0688112Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:58:32.0734248Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:58:32.0788405Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:58:32.0834771Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:58:32.0889327Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:58:32.0934861Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:58:32.0981475Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:58:32.1037754Z Entering 'third_party/flash-attention'
2025-12-04T08:58:32.1087191Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:58:32.1139295Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:58:32.1195658Z Entering 'third_party/flatbuffers'
2025-12-04T08:58:32.1246847Z Entering 'third_party/fmt'
2025-12-04T08:58:32.1293641Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:58:32.1342668Z Entering 'third_party/gloo'
2025-12-04T08:58:32.1390863Z Entering 'third_party/googletest'
2025-12-04T08:58:32.1444910Z Entering 'third_party/ideep'
2025-12-04T08:58:32.1490850Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:58:32.1546314Z Entering 'third_party/ittapi'
2025-12-04T08:58:32.1594026Z Entering 'third_party/kineto'
2025-12-04T08:58:32.1640587Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:32.1692501Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:32.1742592Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:32.1791293Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:32.1842864Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:32.1890023Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:32.1938589Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:32.1984891Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:32.2032695Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:32.2082248Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:32.2135306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:32.2181097Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:32.2232932Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:32.2289147Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:32.2339496Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:32.2395346Z Entering 'third_party/kleidiai'
2025-12-04T08:58:32.2448557Z Entering 'third_party/mimalloc'
2025-12-04T08:58:32.2494746Z Entering 'third_party/nlohmann'
2025-12-04T08:58:32.2543736Z Entering 'third_party/onnx'
2025-12-04T08:58:32.2606340Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:32.2657751Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:58:32.2705850Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:32.2753491Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:32.2800929Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:32.2853327Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:32.2901731Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:32.2954510Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:32.3001111Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:32.3061030Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:32.3110892Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:32.3165382Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:32.3230851Z Entering 'third_party/pocketfft'
2025-12-04T08:58:32.3287764Z Entering 'third_party/protobuf'
2025-12-04T08:58:32.3335861Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:58:32.3381722Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:58:32.3438514Z Entering 'third_party/psimd'
2025-12-04T08:58:32.3488464Z Entering 'third_party/pthreadpool'
2025-12-04T08:58:32.3534428Z Entering 'third_party/pybind11'
2025-12-04T08:58:32.3582317Z Entering 'third_party/python-peachpy'
2025-12-04T08:58:32.3631259Z Entering 'third_party/sleef'
2025-12-04T08:58:32.3683430Z Entering 'third_party/tensorpipe'
2025-12-04T08:58:32.3730780Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:58:32.3778466Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:58:32.3827600Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:58:32.3873552Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:58:32.3920492Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:58:32.3993784Z ##[endgroup]
2025-12-04T08:58:32.4033942Z [command]/usr/bin/git log -1 --format=%H
2025-12-04T08:58:32.4056916Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:58:32.4177790Z ##[group]Run cd "${GITHUB_WORKSPACE}"
2025-12-04T08:58:32.4178052Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T08:58:32.4178285Z [36;1m# Clean stale submodule dirs[0m
2025-12-04T08:58:32.4178518Z [36;1mif [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T08:58:32.4178792Z [36;1m  sudo git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T08:58:32.4179223Z [36;1melse[0m
2025-12-04T08:58:32.4179564Z [36;1m  git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T08:58:32.4179823Z [36;1mfi[0m
2025-12-04T08:58:32.4189029Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:32.4189318Z env:
2025-12-04T08:58:32.4189477Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:32.4189653Z   NO_SUDO: true
2025-12-04T08:58:32.4189820Z ##[endgroup]
2025-12-04T08:58:32.4541310Z Entering 'android/libs/fbjni'
2025-12-04T08:58:32.4579860Z Entering 'third_party/FP16'
2025-12-04T08:58:32.4624083Z Entering 'third_party/FXdiv'
2025-12-04T08:58:32.4659376Z Entering 'third_party/NNPACK'
2025-12-04T08:58:32.4700288Z Entering 'third_party/NVTX'
2025-12-04T08:58:32.4744473Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T08:58:32.4781868Z Entering 'third_party/XNNPACK'
2025-12-04T08:58:32.4906725Z Entering 'third_party/aiter'
2025-12-04T08:58:32.4952810Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T08:58:32.5069263Z Entering 'third_party/benchmark'
2025-12-04T08:58:32.5108228Z Entering 'third_party/composable_kernel'
2025-12-04T08:58:32.5231518Z Entering 'third_party/cpp-httplib'
2025-12-04T08:58:32.5285926Z Entering 'third_party/cpuinfo'
2025-12-04T08:58:32.5325301Z Entering 'third_party/cudnn_frontend'
2025-12-04T08:58:32.5364192Z Entering 'third_party/cutlass'
2025-12-04T08:58:32.5471297Z Entering 'third_party/fbgemm'
2025-12-04T08:58:32.5545651Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T08:58:32.5581723Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T08:58:32.5710144Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T08:58:32.5750860Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T08:58:32.5857984Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T08:58:32.5901007Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T08:58:32.5940945Z Entering 'third_party/fbgemm/external/json'
2025-12-04T08:58:32.5994610Z Entering 'third_party/flash-attention'
2025-12-04T08:58:32.6037833Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T08:58:32.6140435Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T08:58:32.6234599Z Entering 'third_party/flatbuffers'
2025-12-04T08:58:32.6317189Z Entering 'third_party/fmt'
2025-12-04T08:58:32.6353766Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T08:58:32.6391880Z Entering 'third_party/gloo'
2025-12-04T08:58:32.6430972Z Entering 'third_party/googletest'
2025-12-04T08:58:32.6482649Z Entering 'third_party/ideep'
2025-12-04T08:58:32.6518401Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T08:58:32.6612218Z Entering 'third_party/ittapi'
2025-12-04T08:58:32.6651262Z Entering 'third_party/kineto'
2025-12-04T08:58:32.6690950Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T08:58:32.6732814Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T08:58:32.6783295Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T08:58:32.6824528Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T08:58:32.6861545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T08:58:32.6893224Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T08:58:32.6937214Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T08:58:32.6972881Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T08:58:32.7011326Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T08:58:32.7064867Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T08:58:32.7100743Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T08:58:32.7141511Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:32.7195570Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:32.7239591Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T08:58:32.7283915Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T08:58:32.7324496Z Entering 'third_party/kleidiai'
2025-12-04T08:58:32.7371899Z Entering 'third_party/mimalloc'
2025-12-04T08:58:32.7409514Z Entering 'third_party/nlohmann'
2025-12-04T08:58:32.7461016Z Entering 'third_party/onnx'
2025-12-04T08:58:32.7808045Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T08:58:32.7850908Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T08:58:32.7920724Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T08:58:32.7958038Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T08:58:32.8003605Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T08:58:32.8039355Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T08:58:32.8094532Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T08:58:32.8131679Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T08:58:32.8168823Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T08:58:32.8204479Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T08:58:32.8256084Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T08:58:32.8296363Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T08:58:32.8572859Z Entering 'third_party/pocketfft'
2025-12-04T08:58:32.8612704Z Entering 'third_party/protobuf'
2025-12-04T08:58:32.8702817Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T08:58:32.8739174Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T08:58:32.8782312Z Entering 'third_party/psimd'
2025-12-04T08:58:32.8820906Z Entering 'third_party/pthreadpool'
2025-12-04T08:58:32.8857990Z Entering 'third_party/pybind11'
2025-12-04T08:58:32.8906686Z Entering 'third_party/python-peachpy'
2025-12-04T08:58:32.8942055Z Entering 'third_party/sleef'
2025-12-04T08:58:32.8981301Z Entering 'third_party/tensorpipe'
2025-12-04T08:58:32.9021589Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T08:58:32.9059216Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T08:58:32.9100124Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T08:58:32.9141380Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T08:58:32.9180626Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T08:58:32.9333558Z Prepare all required actions
2025-12-04T08:58:32.9334049Z Getting action download info
2025-12-04T08:58:33.0827904Z ##[group]Run ./.github/actions/setup-linux
2025-12-04T08:58:33.0828134Z env:
2025-12-04T08:58:33.0828294Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:33.0828478Z ##[endgroup]
2025-12-04T08:58:33.0860950Z ##[group]Run set -euo pipefail
2025-12-04T08:58:33.0861223Z [36;1mset -euo pipefail[0m
2025-12-04T08:58:33.0861435Z [36;1mfunction get_ec2_metadata() {[0m
2025-12-04T08:58:33.0861704Z [36;1m  # Pulled from instance metadata endpoint for EC2[0m
2025-12-04T08:58:33.0862167Z [36;1m  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html[0m
2025-12-04T08:58:33.0862566Z [36;1m  category=$1[0m
2025-12-04T08:58:33.0862827Z [36;1m  # If it is GCP runner (runner name contains gcp), do not run this[0m
2025-12-04T08:58:33.0863128Z [36;1m  runner_name_str=i-0e5520d20214059b0[0m
2025-12-04T08:58:33.0863400Z [36;1m  if [[ -f /.inarc ]]; then[0m
2025-12-04T08:58:33.0863644Z [36;1m    echo "ARC Runner, no info on ec2 metadata"[0m
2025-12-04T08:58:33.0863918Z [36;1m  elif [[ $runner_name_str == *"gcp"* ]]; then[0m
2025-12-04T08:58:33.0864245Z [36;1m    echo "Runner is from Google Cloud Platform, No info on ec2 metadata"[0m
2025-12-04T08:58:33.0864551Z [36;1m  else[0m
2025-12-04T08:58:33.0865324Z [36;1m    curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}"[0m
2025-12-04T08:58:33.0865966Z [36;1m  fi[0m
2025-12-04T08:58:33.0866122Z [36;1m}[0m
2025-12-04T08:58:33.0866306Z [36;1mecho "ami-id: $(get_ec2_metadata ami-id)"[0m
2025-12-04T08:58:33.0866597Z [36;1mecho "instance-id: $(get_ec2_metadata instance-id)"[0m
2025-12-04T08:58:33.0866943Z [36;1mecho "instance-type: $(get_ec2_metadata instance-type)"[0m
2025-12-04T08:58:33.0867246Z [36;1mecho "system info $(uname -a)"[0m
2025-12-04T08:58:33.0874937Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:33.0875234Z env:
2025-12-04T08:58:33.0875391Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:33.0875570Z ##[endgroup]
2025-12-04T08:58:33.1016736Z ami-id: ami-08982f1c5bf93d976
2025-12-04T08:58:33.1126755Z instance-id: i-0e5520d20214059b0
2025-12-04T08:58:33.1227203Z instance-type: g6.4xlarge
2025-12-04T08:58:33.1239760Z system info Linux ip-10-1-34-86.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep  9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
2025-12-04T08:58:33.1259112Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi
2025-12-04T08:58:33.1259453Z [36;1mif [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi[0m
2025-12-04T08:58:33.1266432Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:33.1266711Z env:
2025-12-04T08:58:33.1266866Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:33.1267051Z ##[endgroup]
2025-12-04T08:58:34.5592074Z Thu Dec  4 08:58:34 2025       
2025-12-04T08:58:34.5592986Z +-----------------------------------------------------------------------------------------+
2025-12-04T08:58:34.5594098Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T08:58:34.5595115Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T08:58:34.5596184Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T08:58:34.5597378Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T08:58:34.5598270Z |                                         |                        |               MIG M. |
2025-12-04T08:58:34.5598657Z |=========================================+========================+======================|
2025-12-04T08:58:34.5662546Z |   0  NVIDIA L4                      Off |   00000000:35:00.0 Off |                    0 |
2025-12-04T08:58:34.5663356Z | N/A   36C    P0             30W /   72W |       0MiB /  23034MiB |      4%      Default |
2025-12-04T08:58:34.5663748Z |                                         |                        |                  N/A |
2025-12-04T08:58:34.5664115Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T08:58:34.5664608Z 
2025-12-04T08:58:34.5664781Z +-----------------------------------------------------------------------------------------+
2025-12-04T08:58:34.5665193Z | Processes:                                                                              |
2025-12-04T08:58:34.5665596Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T08:58:34.5665962Z |        ID   ID                                                               Usage      |
2025-12-04T08:58:34.5666263Z |=========================================================================================|
2025-12-04T08:58:34.5667301Z |  No running processes found                                                             |
2025-12-04T08:58:34.5667741Z +-----------------------------------------------------------------------------------------+
2025-12-04T08:58:34.8891374Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T08:58:34.8892234Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T08:58:34.8902498Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:34.8902784Z env:
2025-12-04T08:58:34.8902938Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:34.8903133Z ##[endgroup]
2025-12-04T08:58:34.8989771Z ##[group]Run if systemctl is-active --quiet docker; then
2025-12-04T08:58:34.8990174Z [36;1mif systemctl is-active --quiet docker; then[0m
2025-12-04T08:58:34.8990520Z [36;1m    echo "Docker daemon is running...";[0m
2025-12-04T08:58:34.8990826Z [36;1melse[0m
2025-12-04T08:58:34.8991132Z [36;1m    echo "Starting docker daemon..." && sudo systemctl start docker;[0m
2025-12-04T08:58:34.8991502Z [36;1mfi[0m
2025-12-04T08:58:34.8999063Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:34.8999339Z env:
2025-12-04T08:58:34.8999489Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:34.8999689Z ##[endgroup]
2025-12-04T08:58:34.9086457Z Docker daemon is running...
2025-12-04T08:58:34.9122866Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T08:58:34.9123083Z with:
2025-12-04T08:58:34.9123230Z   shell: bash
2025-12-04T08:58:34.9123387Z   timeout_minutes: 5
2025-12-04T08:58:34.9123557Z   max_attempts: 3
2025-12-04T08:58:34.9123720Z   retry_wait_seconds: 30
2025-12-04T08:58:34.9125371Z   command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
    --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

# For LF Runners we need to make sure we also login to Meta's ECR docker registry too.
META_AWS_ACCOUNT_ID=308535385114
if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then
    aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
        --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
fi

2025-12-04T08:58:34.9127010Z   polling_interval_seconds: 1
2025-12-04T08:58:34.9127215Z   warning_on_retry: true
2025-12-04T08:58:34.9127402Z   continue_on_error: false
2025-12-04T08:58:34.9127576Z env:
2025-12-04T08:58:34.9127727Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:34.9127912Z   AWS_RETRY_MODE: standard
2025-12-04T08:58:34.9128089Z   AWS_MAX_ATTEMPTS: 5
2025-12-04T08:58:34.9128273Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T08:58:34.9128470Z ##[endgroup]
2025-12-04T08:58:35.9086769Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:58:35.9088144Z Configure a credential helper to remove this warning. See
2025-12-04T08:58:35.9088715Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:58:35.9089053Z 
2025-12-04T08:58:35.9089150Z Login Succeeded
2025-12-04T08:58:36.3720246Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:58:36.3720853Z Configure a credential helper to remove this warning. See
2025-12-04T08:58:36.3721413Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:58:36.3721751Z 
2025-12-04T08:58:36.3721848Z Login Succeeded
2025-12-04T08:58:36.9885628Z Command completed after 1 attempt(s).
2025-12-04T08:58:36.9948416Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T08:58:36.9948789Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T08:58:36.9949111Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T08:58:36.9958271Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:36.9958546Z env:
2025-12-04T08:58:36.9958705Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:36.9958895Z ##[endgroup]
2025-12-04T08:58:37.0055299Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T08:58:37.0055714Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T08:58:37.0056212Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T08:58:37.0056456Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T08:58:37.0056704Z [36;1m# Prune all of the docker images[0m
2025-12-04T08:58:37.0056936Z [36;1mdocker system prune -af[0m
2025-12-04T08:58:37.0064057Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:37.0064333Z env:
2025-12-04T08:58:37.0064494Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:37.0064673Z ##[endgroup]
2025-12-04T08:58:37.0304444Z "docker stop" requires at least 1 argument.
2025-12-04T08:58:37.0304985Z See 'docker stop --help'.
2025-12-04T08:58:37.0305255Z 
2025-12-04T08:58:37.0305523Z Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]
2025-12-04T08:58:37.0305788Z 
2025-12-04T08:58:37.0305895Z Stop one or more running containers
2025-12-04T08:58:37.0569903Z Total reclaimed space: 0B
2025-12-04T08:58:37.0721553Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main
2025-12-04T08:58:37.0721919Z with:
2025-12-04T08:58:37.0722507Z   docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0723147Z   use-custom-docker-registry: true
2025-12-04T08:58:37.0723370Z   docker-build-dir: .ci/docker
2025-12-04T08:58:37.0723580Z   docker-build-script: ./build.sh
2025-12-04T08:58:37.0723793Z   working-directory: .
2025-12-04T08:58:37.0724036Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:37.0724320Z   force-push: false
2025-12-04T08:58:37.0724482Z env:
2025-12-04T08:58:37.0724624Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:37.0724806Z ##[endgroup]
2025-12-04T08:58:37.0740354Z ##[group]Run set -ex
2025-12-04T08:58:37.0740579Z [36;1mset -ex[0m
2025-12-04T08:58:37.0740739Z [36;1m[0m
2025-12-04T08:58:37.0741050Z [36;1m# If the docker build directory or the build script doesn't exist, the action will[0m
2025-12-04T08:58:37.0741523Z [36;1m# gracefully return the docker image name as it is.  Pulling docker image in Linux[0m
2025-12-04T08:58:37.0741940Z [36;1m# job could then download the pre-built image as usual[0m
2025-12-04T08:58:37.0742422Z [36;1mif [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then[0m
2025-12-04T08:58:37.0742866Z [36;1m  echo "skip=false" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0743101Z [36;1melse[0m
2025-12-04T08:58:37.0743295Z [36;1m  echo "skip=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0743626Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0743907Z [36;1m[0m
2025-12-04T08:58:37.0744293Z [36;1m  echo "Not using custom ECR registry.  Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..."[0m
2025-12-04T08:58:37.0744751Z [36;1m  exit 0[0m
2025-12-04T08:58:37.0744901Z [36;1mfi[0m
2025-12-04T08:58:37.0745049Z [36;1m[0m
2025-12-04T08:58:37.0745294Z [36;1mif [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then[0m
2025-12-04T08:58:37.0745726Z [36;1m  # The docker image name already includes the ECR prefix and tag, so we can just[0m
2025-12-04T08:58:37.0746094Z [36;1m  # use it as it is, but first let's extract the tag[0m
2025-12-04T08:58:37.0746432Z [36;1m  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}')[0m
2025-12-04T08:58:37.0746791Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0747148Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0747425Z [36;1melse[0m
2025-12-04T08:58:37.0747614Z [36;1m  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then[0m
2025-12-04T08:58:37.0747888Z [36;1m    CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:}[0m
2025-12-04T08:58:37.0748159Z [36;1m    DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*}[0m
2025-12-04T08:58:37.0748396Z [36;1m  fi[0m
2025-12-04T08:58:37.0748712Z [36;1m  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}")[0m
2025-12-04T08:58:37.0749299Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0749742Z [36;1m  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0750228Z [36;1m  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0750527Z [36;1mfi[0m
2025-12-04T08:58:37.0758000Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:37.0758267Z env:
2025-12-04T08:58:37.0758420Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:37.0758599Z   REPO_NAME: pytorch
2025-12-04T08:58:37.0759303Z   DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0760002Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T08:58:37.0760208Z   DOCKER_BUILD_SCRIPT: ./build.sh
2025-12-04T08:58:37.0760470Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:37.0760752Z   USE_CUSTOM_DOCKER_REGISTRY: true
2025-12-04T08:58:37.0760953Z   CUSTOM_TAG_PREFIX: 
2025-12-04T08:58:37.0761118Z ##[endgroup]
2025-12-04T08:58:37.0787104Z + [[ -d .ci/docker ]]
2025-12-04T08:58:37.0787383Z + [[ -f .ci/docker/./build.sh ]]
2025-12-04T08:58:37.0787637Z + [[ true == \t\r\u\e ]]
2025-12-04T08:58:37.0787856Z + echo skip=false
2025-12-04T08:58:37.0788823Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]]
2025-12-04T08:58:37.0794485Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0795145Z ++ awk -F '[:,]' '{print $2}'
2025-12-04T08:58:37.0819728Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0820572Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0821630Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0843393Z ##[group]Run set +e
2025-12-04T08:58:37.0843620Z [36;1mset +e[0m
2025-12-04T08:58:37.0843781Z [36;1mset -x[0m
2025-12-04T08:58:37.0843938Z [36;1m[0m
2025-12-04T08:58:37.0844090Z [36;1mlogin() {[0m
2025-12-04T08:58:37.0844442Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T08:58:37.0844820Z [36;1m}[0m
2025-12-04T08:58:37.0844977Z [36;1m[0m
2025-12-04T08:58:37.0845121Z [36;1mretry () {[0m
2025-12-04T08:58:37.0845319Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T08:58:37.0845546Z [36;1m}[0m
2025-12-04T08:58:37.0845686Z [36;1m[0m
2025-12-04T08:58:37.0845853Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T08:58:37.0846063Z [36;1m[0m
2025-12-04T08:58:37.0846215Z [36;1mSTART_TIME=$(date +%s)[0m
2025-12-04T08:58:37.0846421Z [36;1m# Wait up to 120 minutes[0m
2025-12-04T08:58:37.0846683Z [36;1mwhile [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do[0m
2025-12-04T08:58:37.0847028Z [36;1m  # Check if image already exists, if it does then skip building it[0m
2025-12-04T08:58:37.0847382Z [36;1m  if docker manifest inspect "${DOCKER_IMAGE}"; then[0m
2025-12-04T08:58:37.0847637Z [36;1m    exit 0[0m
2025-12-04T08:58:37.0847803Z [36;1m  fi[0m
2025-12-04T08:58:37.0847945Z [36;1m[0m
2025-12-04T08:58:37.0848217Z [36;1m  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can[0m
2025-12-04T08:58:37.0848678Z [36;1m  # use this to differentiate between the Docker build and regular build jobs. For the[0m
2025-12-04T08:58:37.0849284Z [36;1m  # latter, it will wait for the Docker images to become available before continuing[0m
2025-12-04T08:58:37.0849642Z [36;1m  if [ "${DOCKER_PUSH:-false}" == "true" ]; then[0m
2025-12-04T08:58:37.0849938Z [36;1m    # It's a Docker build job, let's build the image[0m
2025-12-04T08:58:37.0850185Z [36;1m    break[0m
2025-12-04T08:58:37.0850348Z [36;1m  else[0m
2025-12-04T08:58:37.0850582Z [36;1m    # It's a regular build job, wait for the image to become available[0m
2025-12-04T08:58:37.0850877Z [36;1m    sleep 300[0m
2025-12-04T08:58:37.0851054Z [36;1m  fi[0m
2025-12-04T08:58:37.0851201Z [36;1mdone[0m
2025-12-04T08:58:37.0851347Z [36;1m[0m
2025-12-04T08:58:37.0851595Z [36;1m# NB: This part requires a full checkout. Otherwise, the merge base will[0m
2025-12-04T08:58:37.0852122Z [36;1m# be empty.  The default action would be to continue rebuild the image[0m
2025-12-04T08:58:37.0852497Z [36;1mif [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then[0m
2025-12-04T08:58:37.0852821Z [36;1m  # if we're on the base branch then use the parent commit[0m
2025-12-04T08:58:37.0853117Z [36;1m  MERGE_BASE=$(git rev-parse HEAD~)[0m
2025-12-04T08:58:37.0853341Z [36;1melse[0m
2025-12-04T08:58:37.0853572Z [36;1m  # otherwise we're on a PR, so use the most recent base commit[0m
2025-12-04T08:58:37.0853910Z [36;1m  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")[0m
2025-12-04T08:58:37.0854158Z [36;1mfi[0m
2025-12-04T08:58:37.0854306Z [36;1m[0m
2025-12-04T08:58:37.0854472Z [36;1mif [[ -z "${MERGE_BASE}" ]]; then[0m
2025-12-04T08:58:37.0854725Z [36;1m  echo "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0854950Z [36;1m[0m
2025-12-04T08:58:37.0855272Z [36;1m  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..."[0m
2025-12-04T08:58:37.0855662Z [36;1m  exit 0[0m
2025-12-04T08:58:37.0855815Z [36;1mfi[0m
2025-12-04T08:58:37.0855959Z [36;1m[0m
2025-12-04T08:58:37.0856179Z [36;1mif ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then[0m
2025-12-04T08:58:37.0856659Z [36;1m  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit"[0m
2025-12-04T08:58:37.0857074Z [36;1m  exit 1[0m
2025-12-04T08:58:37.0857229Z [36;1mfi[0m
2025-12-04T08:58:37.0857376Z [36;1m[0m
2025-12-04T08:58:37.0857624Z [36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}")[0m
2025-12-04T08:58:37.0858086Z [36;1m# If no image exists but the hash is the same as the previous hash then we should error out here[0m
2025-12-04T08:58:37.0858499Z [36;1mif [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then[0m
2025-12-04T08:58:37.0858978Z [36;1m  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"[0m
2025-12-04T08:58:37.0859509Z [36;1m  echo "         Will re-build docker image to store in local cache, TTS may be longer"[0m
2025-12-04T08:58:37.0859829Z [36;1mfi[0m
2025-12-04T08:58:37.0859975Z [36;1m[0m
2025-12-04T08:58:37.0860154Z [36;1mecho "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T08:58:37.0866984Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:37.0867268Z env:
2025-12-04T08:58:37.0867436Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:37.0867632Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T08:58:37.0867882Z   BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T08:58:37.0868529Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0869313Z   DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.0869802Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:37.0870084Z   DOCKER_PUSH: 
2025-12-04T08:58:37.0870247Z ##[endgroup]
2025-12-04T08:58:37.0895532Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:37.0896106Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:37.0898361Z + aws ecr get-login-password --region us-east-1
2025-12-04T08:58:37.0899466Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:37.5496234Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:58:37.5496804Z Configure a credential helper to remove this warning. See
2025-12-04T08:58:37.5497327Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:58:37.5497672Z 
2025-12-04T08:58:37.5499056Z Login Succeeded
2025-12-04T08:58:37.5520357Z ++ date +%s
2025-12-04T08:58:37.5531265Z + START_TIME=1764838717
2025-12-04T08:58:37.5534799Z ++ date +%s
2025-12-04T08:58:37.5546486Z + [[ 1764831517 -lt 1764838717 ]]
2025-12-04T08:58:37.5547310Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:37.7355466Z {
2025-12-04T08:58:37.7355770Z 	"schemaVersion": 2,
2025-12-04T08:58:37.7356137Z 	"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
2025-12-04T08:58:37.7356466Z 	"config": {
2025-12-04T08:58:37.7356704Z 		"mediaType": "application/vnd.docker.container.image.v1+json",
2025-12-04T08:58:37.7357007Z 		"size": 34864,
2025-12-04T08:58:37.7357308Z 		"digest": "sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301"
2025-12-04T08:58:37.7357628Z 	},
2025-12-04T08:58:37.7357772Z 	"layers": [
2025-12-04T08:58:37.7357920Z 		{
2025-12-04T08:58:37.7358145Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7358437Z 			"size": 30447951,
2025-12-04T08:58:37.7358752Z 			"digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63"
2025-12-04T08:58:37.7359077Z 		},
2025-12-04T08:58:37.7359214Z 		{
2025-12-04T08:58:37.7359461Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7359755Z 			"size": 1554,
2025-12-04T08:58:37.7360176Z 			"digest": "sha256:0678d56345c994444b77bb70b1177189d23e794748b1d75ffc45d227c7dea94a"
2025-12-04T08:58:37.7360503Z 		},
2025-12-04T08:58:37.7360632Z 		{
2025-12-04T08:58:37.7360856Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7361141Z 			"size": 313275661,
2025-12-04T08:58:37.7361442Z 			"digest": "sha256:45f5c9ddfce78349dff3d5edfbaa0310ae17311f66abdcd7e00fa21b500e801c"
2025-12-04T08:58:37.7361776Z 		},
2025-12-04T08:58:37.7361907Z 		{
2025-12-04T08:58:37.7362123Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7362406Z 			"size": 787,
2025-12-04T08:58:37.7362688Z 			"digest": "sha256:086b1df51ac1162d9c45698e9dfaf91c6c222c8bd9ab01797ac8f9344bc8044f"
2025-12-04T08:58:37.7363016Z 		},
2025-12-04T08:58:37.7363145Z 		{
2025-12-04T08:58:37.7363361Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7363653Z 			"size": 106,
2025-12-04T08:58:37.7363928Z 			"digest": "sha256:fe8a7b64bf98352f89057bcba66beef2fb44cc05fbd3606abccd8e86cf476234"
2025-12-04T08:58:37.7364326Z 		},
2025-12-04T08:58:37.7364454Z 		{
2025-12-04T08:58:37.7364663Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7364939Z 			"size": 703,
2025-12-04T08:58:37.7365204Z 			"digest": "sha256:7680723e9a578033dd106b45784c639f06cc8adb1f5239ec513d9de01087c1af"
2025-12-04T08:58:37.7365512Z 		},
2025-12-04T08:58:37.7365643Z 		{
2025-12-04T08:58:37.7365858Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7366127Z 			"size": 1216,
2025-12-04T08:58:37.7366401Z 			"digest": "sha256:9c5027aeeb4e3101f48c1d2e400c387110e1009e42497ee801f1b4b7f7edb5c0"
2025-12-04T08:58:37.7366727Z 		},
2025-12-04T08:58:37.7366867Z 		{
2025-12-04T08:58:37.7367080Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7367613Z 			"size": 483,
2025-12-04T08:58:37.7367960Z 			"digest": "sha256:9a56521103600bd37a1e7c1191b5136c2d738c092f8a6701499f7068a32c2628"
2025-12-04T08:58:37.7368269Z 		},
2025-12-04T08:58:37.7368400Z 		{
2025-12-04T08:58:37.7368613Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7368887Z 			"size": 110361875,
2025-12-04T08:58:37.7369164Z 			"digest": "sha256:375c4427e9141269458333b1463fdb219e736fd6231ec1c56c625c48437ace77"
2025-12-04T08:58:37.7369483Z 		},
2025-12-04T08:58:37.7369609Z 		{
2025-12-04T08:58:37.7369824Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7370103Z 			"size": 4961,
2025-12-04T08:58:37.7370381Z 			"digest": "sha256:a86faaa7dbdd70e678e5ea20072637ee42618921ca8f80ca089f789325d4b0c2"
2025-12-04T08:58:37.7370691Z 		},
2025-12-04T08:58:37.7370818Z 		{
2025-12-04T08:58:37.7371190Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7371473Z 			"size": 1755,
2025-12-04T08:58:37.7371755Z 			"digest": "sha256:fb7848686804957915d98f8655ef6da0fe4c521b50a82aefdebf475983505a15"
2025-12-04T08:58:37.7372072Z 		},
2025-12-04T08:58:37.7372197Z 		{
2025-12-04T08:58:37.7372416Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7372703Z 			"size": 724,
2025-12-04T08:58:37.7372966Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:58:37.7373277Z 		},
2025-12-04T08:58:37.7373406Z 		{
2025-12-04T08:58:37.7373622Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7373900Z 			"size": 543,
2025-12-04T08:58:37.7374167Z 			"digest": "sha256:79dc80f426b29d4ae9157b967050b03e66aa0c4b1295b944a1dd70106be87066"
2025-12-04T08:58:37.7374479Z 		},
2025-12-04T08:58:37.7374597Z 		{
2025-12-04T08:58:37.7374814Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7375092Z 			"size": 3185190117,
2025-12-04T08:58:37.7375381Z 			"digest": "sha256:a13fcc1b90bb9c251ebe7ef2a03c4cb3afa1c8bdafe84f5f85136773059a3735"
2025-12-04T08:58:37.7375705Z 		},
2025-12-04T08:58:37.7375835Z 		{
2025-12-04T08:58:37.7376040Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7376315Z 			"size": 32,
2025-12-04T08:58:37.7376588Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7376899Z 		},
2025-12-04T08:58:37.7377036Z 		{
2025-12-04T08:58:37.7377253Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7377574Z 			"size": 396,
2025-12-04T08:58:37.7378021Z 			"digest": "sha256:549db4d6c618ecd9534658a233e3c90508f82d8735f965c2786b2eaa078869e5"
2025-12-04T08:58:37.7378518Z 		},
2025-12-04T08:58:37.7378660Z 		{
2025-12-04T08:58:37.7378889Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7379174Z 			"size": 236860,
2025-12-04T08:58:37.7379458Z 			"digest": "sha256:5c63528cb580001e65104f4cb0809bf0673a00f989a7db42fd6d86aa1ec27cee"
2025-12-04T08:58:37.7379774Z 		},
2025-12-04T08:58:37.7379896Z 		{
2025-12-04T08:58:37.7380111Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7380393Z 			"size": 231,
2025-12-04T08:58:37.7380659Z 			"digest": "sha256:75bd83b989a44e4d4119a3f972891025eb0e9ce95cfbe4a0ca5cdbe7130028d6"
2025-12-04T08:58:37.7380979Z 		},
2025-12-04T08:58:37.7381106Z 		{
2025-12-04T08:58:37.7381313Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7381593Z 			"size": 3043497,
2025-12-04T08:58:37.7381876Z 			"digest": "sha256:de6e78970f517178cb91f36cd02bd9ca7b72a08fb82a0f9007516026f258c035"
2025-12-04T08:58:37.7382185Z 		},
2025-12-04T08:58:37.7382314Z 		{
2025-12-04T08:58:37.7382528Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7382805Z 			"size": 1472,
2025-12-04T08:58:37.7383089Z 			"digest": "sha256:e13ed7c7e4736e81dc21af755b3363eb26e4d3b2f1ca988dfe65effa47d8fa42"
2025-12-04T08:58:37.7383539Z 		},
2025-12-04T08:58:37.7383668Z 		{
2025-12-04T08:58:37.7383876Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7384152Z 			"size": 481,
2025-12-04T08:58:37.7384432Z 			"digest": "sha256:6e2949bcb74152577a0f20c38bcb6dd80f5e68427e3e531a80e08c9ecc73a979"
2025-12-04T08:58:37.7384745Z 		},
2025-12-04T08:58:37.7384876Z 		{
2025-12-04T08:58:37.7385097Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7385372Z 			"size": 202,
2025-12-04T08:58:37.7385648Z 			"digest": "sha256:14d69d9aaec70287efd2fd35c4f93e43a29a4098458cc9fca1c93f02ad7356cb"
2025-12-04T08:58:37.7385967Z 		},
2025-12-04T08:58:37.7386095Z 		{
2025-12-04T08:58:37.7386325Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7386600Z 			"size": 607,
2025-12-04T08:58:37.7386968Z 			"digest": "sha256:5c02769dd8e5bba2f7f5fd84bde9595fcb3bdbffcae497503fa846f9b5e78bf5"
2025-12-04T08:58:37.7387287Z 		},
2025-12-04T08:58:37.7387412Z 		{
2025-12-04T08:58:37.7387638Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7387916Z 			"size": 7889619584,
2025-12-04T08:58:37.7388200Z 			"digest": "sha256:35041ce524ac4afec40ecd73b1393c830614f1f79d43a6439767a6c7d5b7027b"
2025-12-04T08:58:37.7388511Z 		},
2025-12-04T08:58:37.7388631Z 		{
2025-12-04T08:58:37.7388845Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7389118Z 			"size": 830,
2025-12-04T08:58:37.7389376Z 			"digest": "sha256:2fa92dc5885e080e049ceb4139288b6c0e39fab34256945708b08ea55a1f7a0b"
2025-12-04T08:58:37.7389686Z 		},
2025-12-04T08:58:37.7389811Z 		{
2025-12-04T08:58:37.7390042Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7390319Z 			"size": 33451739,
2025-12-04T08:58:37.7390608Z 			"digest": "sha256:2b85eafbd92a0e70a0a70154ad8bf4584095e576d95873368f30373f5966714a"
2025-12-04T08:58:37.7390918Z 		},
2025-12-04T08:58:37.7391042Z 		{
2025-12-04T08:58:37.7391252Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7391532Z 			"size": 104,
2025-12-04T08:58:37.7391803Z 			"digest": "sha256:ff755a4ddad7880f23c6b767d432d6f1eafdb62b3ea18f8a98e22c441c099fcb"
2025-12-04T08:58:37.7392127Z 		},
2025-12-04T08:58:37.7392255Z 		{
2025-12-04T08:58:37.7392461Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7392749Z 			"size": 1496,
2025-12-04T08:58:37.7393020Z 			"digest": "sha256:09eb41bdf42d8605b57b2363348154140904dec914b34a67298b82122bfce2b3"
2025-12-04T08:58:37.7393324Z 		},
2025-12-04T08:58:37.7393451Z 		{
2025-12-04T08:58:37.7393665Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7393944Z 			"size": 458787828,
2025-12-04T08:58:37.7394224Z 			"digest": "sha256:11ede4d59e935e62f41b33220fe871794ab5e57ce724173b713368977683bcf6"
2025-12-04T08:58:37.7394534Z 		},
2025-12-04T08:58:37.7394663Z 		{
2025-12-04T08:58:37.7394876Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7395167Z 			"size": 164,
2025-12-04T08:58:37.7395440Z 			"digest": "sha256:1283cd8f801a142172f3ab76fd472df8583223d9437de3e4d18d8cf98ea3fa98"
2025-12-04T08:58:37.7395750Z 		},
2025-12-04T08:58:37.7395879Z 		{
2025-12-04T08:58:37.7396097Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7396369Z 			"size": 346,
2025-12-04T08:58:37.7396637Z 			"digest": "sha256:024fa855425fa524ad4500660cf61d53be62b99556d31b8b280d14caba434a35"
2025-12-04T08:58:37.7396957Z 		},
2025-12-04T08:58:37.7397087Z 		{
2025-12-04T08:58:37.7397298Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7397577Z 			"size": 32,
2025-12-04T08:58:37.7397855Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7398167Z 		},
2025-12-04T08:58:37.7398295Z 		{
2025-12-04T08:58:37.7398509Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7398914Z 			"size": 106,
2025-12-04T08:58:37.7399187Z 			"digest": "sha256:303e6747a62efecf5efa1f97d0e66b40a3b39da8d79a51f75b89f4c92ae7ec52"
2025-12-04T08:58:37.7399514Z 		},
2025-12-04T08:58:37.7399642Z 		{
2025-12-04T08:58:37.7399852Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7400213Z 			"size": 424,
2025-12-04T08:58:37.7400480Z 			"digest": "sha256:3017cdf4838bcc9a33daebc07487f8ae1f6bd6e7ce8322c14f5480e8db9ef90e"
2025-12-04T08:58:37.7400800Z 		},
2025-12-04T08:58:37.7400927Z 		{
2025-12-04T08:58:37.7401137Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7401408Z 			"size": 19309374,
2025-12-04T08:58:37.7401695Z 			"digest": "sha256:6b6cd1c358e886dc6ed7fd46ac4bcc1a0a73b7b1301739ea1953478ee5d83f50"
2025-12-04T08:58:37.7402012Z 		},
2025-12-04T08:58:37.7402138Z 		{
2025-12-04T08:58:37.7402444Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7402749Z 			"size": 108,
2025-12-04T08:58:37.7403019Z 			"digest": "sha256:b2dd045011241d1cf8889e2a7369d9fe4844dfe15529b520ccd6a59bd3c1532e"
2025-12-04T08:58:37.7403332Z 		},
2025-12-04T08:58:37.7403461Z 		{
2025-12-04T08:58:37.7403671Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7403954Z 			"size": 827,
2025-12-04T08:58:37.7404228Z 			"digest": "sha256:55adc51fe5897031d4cf2f2b8fd162213f6e46a52848630c616606271b97952e"
2025-12-04T08:58:37.7404541Z 		},
2025-12-04T08:58:37.7404663Z 		{
2025-12-04T08:58:37.7404878Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7405157Z 			"size": 724,
2025-12-04T08:58:37.7405426Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:58:37.7405741Z 		},
2025-12-04T08:58:37.7405867Z 		{
2025-12-04T08:58:37.7406076Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7406354Z 			"size": 149,
2025-12-04T08:58:37.7406621Z 			"digest": "sha256:a43ca0e4b837964b12b7469194cfe939c26de027298040028975324dce25938a"
2025-12-04T08:58:37.7406937Z 		},
2025-12-04T08:58:37.7407068Z 		{
2025-12-04T08:58:37.7407286Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7407558Z 			"size": 138,
2025-12-04T08:58:37.7407832Z 			"digest": "sha256:b7212f17fd1404837fcfdd086dd0e2667931e4db377d45d8d89a44390c84e11d"
2025-12-04T08:58:37.7408149Z 		},
2025-12-04T08:58:37.7408283Z 		{
2025-12-04T08:58:37.7408496Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7408773Z 			"size": 141,
2025-12-04T08:58:37.7409047Z 			"digest": "sha256:083e42cac090e6486c35f392b64ee54448f5e4aa947003aeb3e1f92c8ea5c099"
2025-12-04T08:58:37.7409358Z 		},
2025-12-04T08:58:37.7409489Z 		{
2025-12-04T08:58:37.7409702Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7409978Z 			"size": 32,
2025-12-04T08:58:37.7410256Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7410577Z 		},
2025-12-04T08:58:37.7410701Z 		{
2025-12-04T08:58:37.7410913Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7411189Z 			"size": 223,
2025-12-04T08:58:37.7411455Z 			"digest": "sha256:0a00b784a4aac341795729b254f7edd09e811b7f51d0c58e0e6bfeeee6940503"
2025-12-04T08:58:37.7411767Z 		},
2025-12-04T08:58:37.7411891Z 		{
2025-12-04T08:58:37.7412103Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7412384Z 			"size": 255,
2025-12-04T08:58:37.7412652Z 			"digest": "sha256:c6173c779f7ba143a21214ea5f032b141863a37ceb4c0ac01d3248c216ce5241"
2025-12-04T08:58:37.7412962Z 		},
2025-12-04T08:58:37.7413083Z 		{
2025-12-04T08:58:37.7413293Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7413568Z 			"size": 145520672,
2025-12-04T08:58:37.7413845Z 			"digest": "sha256:ed3d1e3387b924585c332bf1bc252fa159cd0d25256a874043ff0141b1ab5ff7"
2025-12-04T08:58:37.7414160Z 		},
2025-12-04T08:58:37.7414404Z 		{
2025-12-04T08:58:37.7414612Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7414888Z 			"size": 106,
2025-12-04T08:58:37.7444380Z 			"digest": "sha256:b29343478586aeee19d2a622661716f6f1591280c890f49b727a8da13a610784"
2025-12-04T08:58:37.7444739Z 		},
2025-12-04T08:58:37.7444872Z 		{
2025-12-04T08:58:37.7445104Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7445479Z 			"size": 312293530,
2025-12-04T08:58:37.7445782Z 			"digest": "sha256:c6f0520487fb506bc4601fd84d5f28d8a76b203e004731e4b2067c2ab1a14e0b"
2025-12-04T08:58:37.7446102Z 		},
2025-12-04T08:58:37.7446225Z 		{
2025-12-04T08:58:37.7446450Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7446737Z 			"size": 3058011133,
2025-12-04T08:58:37.7447210Z 			"digest": "sha256:148171691cd4c4d20310d490d4b4dd903490d04ea07fb8f7e668a28768683e9a"
2025-12-04T08:58:37.7447536Z 		},
2025-12-04T08:58:37.7447672Z 		{
2025-12-04T08:58:37.7447917Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7448210Z 			"size": 129,
2025-12-04T08:58:37.7448499Z 			"digest": "sha256:2c666d30ed77fff9ff1167d41cd645dad98280fcbe941f5bc3828c7ae66b1287"
2025-12-04T08:58:37.7448812Z 		},
2025-12-04T08:58:37.7448936Z 		{
2025-12-04T08:58:37.7449153Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7449434Z 			"size": 880,
2025-12-04T08:58:37.7449707Z 			"digest": "sha256:5d8d3a0a98e012c5068e0f3bae5a03e3148ecf2d063634eee4c9241a1e3fdfb5"
2025-12-04T08:58:37.7450021Z 		},
2025-12-04T08:58:37.7450150Z 		{
2025-12-04T08:58:37.7450405Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7450677Z 			"size": 724,
2025-12-04T08:58:37.7450941Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:58:37.7451242Z 		},
2025-12-04T08:58:37.7451367Z 		{
2025-12-04T08:58:37.7451583Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7451856Z 			"size": 139,
2025-12-04T08:58:37.7452137Z 			"digest": "sha256:b06bafce9e817295d8127207747c80aa18e04392ff0875844fc30a1e794a8a0c"
2025-12-04T08:58:37.7452453Z 		},
2025-12-04T08:58:37.7452572Z 		{
2025-12-04T08:58:37.7452790Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7453082Z 			"size": 32,
2025-12-04T08:58:37.7453368Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7453687Z 		},
2025-12-04T08:58:37.7453813Z 		{
2025-12-04T08:58:37.7454030Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7454300Z 			"size": 159,
2025-12-04T08:58:37.7454576Z 			"digest": "sha256:15e0d7e4590d3d8f598d05aec3a92f891bf8b4605bcc38cc2de852b6014ef8f3"
2025-12-04T08:58:37.7454904Z 		},
2025-12-04T08:58:37.7455031Z 		{
2025-12-04T08:58:37.7455248Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7455529Z 			"size": 1011,
2025-12-04T08:58:37.7455801Z 			"digest": "sha256:a514bd1add3164d8d7ca99aa19294c4ed8b97b074635d98714c4f598a959f4cd"
2025-12-04T08:58:37.7456125Z 		},
2025-12-04T08:58:37.7456255Z 		{
2025-12-04T08:58:37.7456465Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7456745Z 			"size": 724,
2025-12-04T08:58:37.7457019Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:58:37.7457331Z 		},
2025-12-04T08:58:37.7457450Z 		{
2025-12-04T08:58:37.7457663Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7457938Z 			"size": 134,
2025-12-04T08:58:37.7458207Z 			"digest": "sha256:57b84ee6000204f27a1d9bca199b19be4c86ecd324540dbdf239c56a6c3b34ea"
2025-12-04T08:58:37.7458524Z 		},
2025-12-04T08:58:37.7458649Z 		{
2025-12-04T08:58:37.7458860Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7459139Z 			"size": 32,
2025-12-04T08:58:37.7459553Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7459865Z 		},
2025-12-04T08:58:37.7459989Z 		{
2025-12-04T08:58:37.7460204Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7460481Z 			"size": 157,
2025-12-04T08:58:37.7460753Z 			"digest": "sha256:b8babeff6d817a5961dddc15c6bdfdbd05da187fae75d5804015f99fd7c066d8"
2025-12-04T08:58:37.7461072Z 		},
2025-12-04T08:58:37.7461206Z 		{
2025-12-04T08:58:37.7461417Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7461693Z 			"size": 602,
2025-12-04T08:58:37.7461965Z 			"digest": "sha256:83779ddf6a85ab387f64a45f274cba245b69e4fd1931ff0b5d7d3efd4b7a43bc"
2025-12-04T08:58:37.7462276Z 		},
2025-12-04T08:58:37.7462401Z 		{
2025-12-04T08:58:37.7462695Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7462969Z 			"size": 724,
2025-12-04T08:58:37.7463233Z 			"digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84"
2025-12-04T08:58:37.7463556Z 		},
2025-12-04T08:58:37.7463675Z 		{
2025-12-04T08:58:37.7463887Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7464163Z 			"size": 155,
2025-12-04T08:58:37.7464432Z 			"digest": "sha256:8b7620c0d736cc79381207ce5afe2af90f0cd7f0cd394577d2c9520d7f74762f"
2025-12-04T08:58:37.7464757Z 		},
2025-12-04T08:58:37.7464892Z 		{
2025-12-04T08:58:37.7465120Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7465408Z 			"size": 32,
2025-12-04T08:58:37.7465698Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7466026Z 		},
2025-12-04T08:58:37.7466152Z 		{
2025-12-04T08:58:37.7466379Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7466670Z 			"size": 188,
2025-12-04T08:58:37.7466954Z 			"digest": "sha256:3bcfa090e4efd3677425f76baea9f1e0c50a75d8c6b5713ec05310f1dff24539"
2025-12-04T08:58:37.7467282Z 		},
2025-12-04T08:58:37.7467412Z 		{
2025-12-04T08:58:37.7467629Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7467918Z 			"size": 1370,
2025-12-04T08:58:37.7468204Z 			"digest": "sha256:eb0504ec4d9218a79896b604f73dc0ea5a0f96266ad9c2cdbbbe5f0f18222694"
2025-12-04T08:58:37.7468526Z 		},
2025-12-04T08:58:37.7468648Z 		{
2025-12-04T08:58:37.7468876Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7469161Z 			"size": 32,
2025-12-04T08:58:37.7469432Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7469748Z 		},
2025-12-04T08:58:37.7469875Z 		{
2025-12-04T08:58:37.7470083Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7470361Z 			"size": 136,
2025-12-04T08:58:37.7470640Z 			"digest": "sha256:15d0fec09d7b196a1462d51516ee90fc3443ba178d3e56d59cacf32146b4321d"
2025-12-04T08:58:37.7470950Z 		},
2025-12-04T08:58:37.7471076Z 		{
2025-12-04T08:58:37.7471289Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7471567Z 			"size": 528,
2025-12-04T08:58:37.7471841Z 			"digest": "sha256:cca81fcc62a949959ca4dd3c9056fb293d548ef8607127eeeef6cfd3a8897ca8"
2025-12-04T08:58:37.7472162Z 		},
2025-12-04T08:58:37.7472288Z 		{
2025-12-04T08:58:37.7472496Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7472789Z 			"size": 32,
2025-12-04T08:58:37.7473064Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7473373Z 		},
2025-12-04T08:58:37.7473509Z 		{
2025-12-04T08:58:37.7473729Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7474000Z 			"size": 104,
2025-12-04T08:58:37.7474283Z 			"digest": "sha256:b0b8f9b5c6ab98db9cd830dc584e1b6aec9add139e4cc48d8c243d36691e25b4"
2025-12-04T08:58:37.7474609Z 		},
2025-12-04T08:58:37.7474732Z 		{
2025-12-04T08:58:37.7475034Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7475314Z 			"size": 435,
2025-12-04T08:58:37.7475581Z 			"digest": "sha256:0606ca4d47a8a70e91e92b03ca51a85e731641b09342136a54ef2f2a6d9dfb44"
2025-12-04T08:58:37.7475887Z 		},
2025-12-04T08:58:37.7476015Z 		{
2025-12-04T08:58:37.7476233Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7476503Z 			"size": 32,
2025-12-04T08:58:37.7476779Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7477097Z 		},
2025-12-04T08:58:37.7477218Z 		{
2025-12-04T08:58:37.7477442Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7477726Z 			"size": 109,
2025-12-04T08:58:37.7478089Z 			"digest": "sha256:2f80a4e1b3b95ed67bb781ea787e8a63e46de79117d9d8e65c257072b38afa2d"
2025-12-04T08:58:37.7478418Z 		},
2025-12-04T08:58:37.7478551Z 		{
2025-12-04T08:58:37.7478765Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7479070Z 			"size": 1896,
2025-12-04T08:58:37.7479356Z 			"digest": "sha256:35c916fb1bd057e517dcab78c3a2a018e68096d8993892ad84f47562d37ae352"
2025-12-04T08:58:37.7479673Z 		},
2025-12-04T08:58:37.7479807Z 		{
2025-12-04T08:58:37.7480087Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7480382Z 			"size": 197526165,
2025-12-04T08:58:37.7480667Z 			"digest": "sha256:195537b7dafc96192f768323b1a8cc2a914d41959849b73198579576b0872a44"
2025-12-04T08:58:37.7480992Z 		},
2025-12-04T08:58:37.7481125Z 		{
2025-12-04T08:58:37.7481341Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7481623Z 			"size": 106,
2025-12-04T08:58:37.7481903Z 			"digest": "sha256:dc454fd3967e5735b2498b7f1d958a2c626987d5e4ce225ca98da3cd945b59f3"
2025-12-04T08:58:37.7482225Z 		},
2025-12-04T08:58:37.7482352Z 		{
2025-12-04T08:58:37.7482572Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7482854Z 			"size": 165,
2025-12-04T08:58:37.7483124Z 			"digest": "sha256:701b34f115fa897181c046dc37288e87cbc3ad74c36a9e2224b5bfe7c5703afb"
2025-12-04T08:58:37.7483441Z 		},
2025-12-04T08:58:37.7483566Z 		{
2025-12-04T08:58:37.7483788Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7484070Z 			"size": 7944,
2025-12-04T08:58:37.7484349Z 			"digest": "sha256:39cefc00ffedebc9098261c798408b87a20c95a88fccb110594077f48dadf760"
2025-12-04T08:58:37.7484660Z 		},
2025-12-04T08:58:37.7484785Z 		{
2025-12-04T08:58:37.7485002Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7485273Z 			"size": 8071,
2025-12-04T08:58:37.7485545Z 			"digest": "sha256:6ae51eb61a325b2c2995a5088c81aa20821b75be65b5aa722c7c40556b5d03ea"
2025-12-04T08:58:37.7485856Z 		},
2025-12-04T08:58:37.7485982Z 		{
2025-12-04T08:58:37.7486214Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7486492Z 			"size": 304,
2025-12-04T08:58:37.7486770Z 			"digest": "sha256:1fd5341e66dfc0c1ae23af014641a92a6fd02640c528fe6d4dc55921ed659a26"
2025-12-04T08:58:37.7487081Z 		},
2025-12-04T08:58:37.7487206Z 		{
2025-12-04T08:58:37.7487423Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7487700Z 			"size": 13364291,
2025-12-04T08:58:37.7487991Z 			"digest": "sha256:72a7c87e35e40ab796f90aee1b51add7902f0cdc44406d2505b6c6a1f55a8da6"
2025-12-04T08:58:37.7488317Z 		},
2025-12-04T08:58:37.7488440Z 		{
2025-12-04T08:58:37.7488653Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7488932Z 			"size": 108,
2025-12-04T08:58:37.7489204Z 			"digest": "sha256:ec36862ac98ebaac52ee1a8b1d162d45bd0e3bf59ae7e19c8f80ad3960b4c600"
2025-12-04T08:58:37.7489530Z 		},
2025-12-04T08:58:37.7489660Z 		{
2025-12-04T08:58:37.7489873Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7490157Z 			"size": 54145699,
2025-12-04T08:58:37.7490535Z 			"digest": "sha256:05ddbf246e8add0e293474dbf88bb028d5a295a25ac59e8648a18db644377773"
2025-12-04T08:58:37.7490855Z 		},
2025-12-04T08:58:37.7490980Z 		{
2025-12-04T08:58:37.7491195Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T08:58:37.7491476Z 			"size": 32,
2025-12-04T08:58:37.7491746Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T08:58:37.7492063Z 		}
2025-12-04T08:58:37.7492195Z 	]
2025-12-04T08:58:37.7492318Z }
2025-12-04T08:58:37.7492477Z + exit 0
2025-12-04T08:58:37.7528789Z ##[group]Run set -eux
2025-12-04T08:58:37.7529004Z [36;1mset -eux[0m
2025-12-04T08:58:37.7529295Z [36;1m# It's ok if this steps fails, it would then be an anonymous user like what we used to have[0m
2025-12-04T08:58:37.7530244Z [36;1maws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true[0m
2025-12-04T08:58:37.7538336Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:37.7538627Z env:
2025-12-04T08:58:37.7538789Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:37.7538979Z ##[endgroup]
2025-12-04T08:58:37.7577462Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token
2025-12-04T08:58:37.7578308Z + jq --raw-output .SecretString
2025-12-04T08:58:37.7579349Z + jq -r .docker_hub_readonly_token
2025-12-04T08:58:37.7580545Z + docker login --username pytorchbot --password-stdin
2025-12-04T08:58:38.2670162Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:58:38.2670839Z Configure a credential helper to remove this warning. See
2025-12-04T08:58:38.2671385Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:58:38.2671736Z 
2025-12-04T08:58:38.2672048Z Login Succeeded
2025-12-04T08:58:38.2752807Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:}
2025-12-04T08:58:38.2753118Z [36;1mtag=${ECR_DOCKER_IMAGE##*:}[0m
2025-12-04T08:58:38.2753436Z [36;1mecho "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}"[0m
2025-12-04T08:58:38.2761282Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:38.2761559Z env:
2025-12-04T08:58:38.2761717Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:38.2762315Z   ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:38.2762924Z ##[endgroup]
2025-12-04T08:58:38.2791014Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:38.2830066Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main
2025-12-04T08:58:38.2830417Z with:
2025-12-04T08:58:38.2830979Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:38.2831672Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:38.2831942Z env:
2025-12-04T08:58:38.2832098Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:38.2832284Z ##[endgroup]
2025-12-04T08:58:38.2845217Z ##[group]Run set -x
2025-12-04T08:58:38.2845413Z [36;1mset -x[0m
2025-12-04T08:58:38.2845566Z [36;1mset +e[0m
2025-12-04T08:58:38.2845717Z [36;1m[0m
2025-12-04T08:58:38.2845868Z [36;1mlogin() {[0m
2025-12-04T08:58:38.2846208Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T08:58:38.2846584Z [36;1m}[0m
2025-12-04T08:58:38.2846732Z [36;1m[0m
2025-12-04T08:58:38.2846917Z [36;1mretry () {[0m
2025-12-04T08:58:38.2847104Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T08:58:38.2847323Z [36;1m}[0m
2025-12-04T08:58:38.2847469Z [36;1m[0m
2025-12-04T08:58:38.2847636Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T08:58:38.2847849Z [36;1m[0m
2025-12-04T08:58:38.2848358Z [36;1mIMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024')[0m
2025-12-04T08:58:38.2848823Z [36;1mecho "Compressed size of image in MB: ${IMAGE_SIZE}"[0m
2025-12-04T08:58:38.2849086Z [36;1m[0m
2025-12-04T08:58:38.2849234Z [36;1mset -e[0m
2025-12-04T08:58:38.2849477Z [36;1m# ignore output since only exit code is used for conditional[0m
2025-12-04T08:58:38.2849821Z [36;1m# only pull docker image if it's not available locally[0m
2025-12-04T08:58:38.2850199Z [36;1mif ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then[0m
2025-12-04T08:58:38.2850558Z [36;1m  retry docker pull "${DOCKER_IMAGE}"[0m
2025-12-04T08:58:38.2850782Z [36;1mfi[0m
2025-12-04T08:58:38.2857666Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T08:58:38.2857931Z env:
2025-12-04T08:58:38.2858088Z   GIT_DEFAULT_BRANCH: main
2025-12-04T08:58:38.2858665Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:38.2859335Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:38.2859609Z ##[endgroup]
2025-12-04T08:58:38.2885279Z + set +e
2025-12-04T08:58:38.2885745Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:38.2886383Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:38.2888803Z + aws ecr get-login-password --region us-east-1
2025-12-04T08:58:38.2889962Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T08:58:38.7437128Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T08:58:38.7437830Z Configure a credential helper to remove this warning. See
2025-12-04T08:58:38.7438444Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T08:58:38.7438875Z 
2025-12-04T08:58:38.7441833Z Login Succeeded
2025-12-04T08:58:38.7470740Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:38.7472192Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024'
2025-12-04T08:58:38.9303598Z + IMAGE_SIZE=15091.581844329834
2025-12-04T08:58:38.9303973Z + echo 'Compressed size of image in MB: 15091.581844329834'
2025-12-04T08:58:38.9304331Z + set -e
2025-12-04T08:58:38.9305082Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:38.9306265Z Compressed size of image in MB: 15091.581844329834
2025-12-04T08:58:38.9422142Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:38.9423173Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T08:58:39.1547473Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image
2025-12-04T08:58:39.1550819Z 63e5bc7682b8: Pulling fs layer
2025-12-04T08:58:39.1551316Z 0678d56345c9: Pulling fs layer
2025-12-04T08:58:39.1551740Z 45f5c9ddfce7: Pulling fs layer
2025-12-04T08:58:39.1552248Z 086b1df51ac1: Pulling fs layer
2025-12-04T08:58:39.1552646Z fe8a7b64bf98: Pulling fs layer
2025-12-04T08:58:39.1553044Z 7680723e9a57: Pulling fs layer
2025-12-04T08:58:39.1553473Z 9c5027aeeb4e: Pulling fs layer
2025-12-04T08:58:39.1553877Z 9a5652110360: Pulling fs layer
2025-12-04T08:58:39.1554286Z 375c4427e914: Pulling fs layer
2025-12-04T08:58:39.1554660Z a86faaa7dbdd: Pulling fs layer
2025-12-04T08:58:39.1554999Z fb7848686804: Pulling fs layer
2025-12-04T08:58:39.1555329Z 3541df015cdb: Pulling fs layer
2025-12-04T08:58:39.1555942Z 79dc80f426b2: Pulling fs layer
2025-12-04T08:58:39.1556305Z a13fcc1b90bb: Pulling fs layer
2025-12-04T08:58:39.1556629Z 086b1df51ac1: Waiting
2025-12-04T08:58:39.1556942Z 4f4fb700ef54: Pulling fs layer
2025-12-04T08:58:39.1557289Z 549db4d6c618: Pulling fs layer
2025-12-04T08:58:39.1557607Z 5c63528cb580: Pulling fs layer
2025-12-04T08:58:39.1557927Z fe8a7b64bf98: Waiting
2025-12-04T08:58:39.1558215Z 75bd83b989a4: Pulling fs layer
2025-12-04T08:58:39.1558541Z 7680723e9a57: Waiting
2025-12-04T08:58:39.1558798Z 9c5027aeeb4e: Waiting
2025-12-04T08:58:39.1558969Z 3541df015cdb: Waiting
2025-12-04T08:58:39.1559130Z 9a5652110360: Waiting
2025-12-04T08:58:39.1559305Z de6e78970f51: Pulling fs layer
2025-12-04T08:58:39.1559514Z e13ed7c7e473: Pulling fs layer
2025-12-04T08:58:39.1559712Z 6e2949bcb741: Pulling fs layer
2025-12-04T08:58:39.1559976Z 14d69d9aaec7: Pulling fs layer
2025-12-04T08:58:39.1560174Z 79dc80f426b2: Waiting
2025-12-04T08:58:39.1560347Z 5c02769dd8e5: Pulling fs layer
2025-12-04T08:58:39.1560530Z a13fcc1b90bb: Waiting
2025-12-04T08:58:39.1560700Z a86faaa7dbdd: Waiting
2025-12-04T08:58:39.1560868Z 35041ce524ac: Pulling fs layer
2025-12-04T08:58:39.1561052Z 2fa92dc5885e: Pulling fs layer
2025-12-04T08:58:39.1561233Z 549db4d6c618: Waiting
2025-12-04T08:58:39.1561404Z 2b85eafbd92a: Pulling fs layer
2025-12-04T08:58:39.1561578Z 5c63528cb580: Waiting
2025-12-04T08:58:39.1561731Z fb7848686804: Waiting
2025-12-04T08:58:39.1561884Z 375c4427e914: Waiting
2025-12-04T08:58:39.1562048Z 4f4fb700ef54: Waiting
2025-12-04T08:58:39.1562200Z 5c02769dd8e5: Waiting
2025-12-04T08:58:39.1562353Z 6e2949bcb741: Waiting
2025-12-04T08:58:39.1562505Z 35041ce524ac: Waiting
2025-12-04T08:58:39.1562662Z 14d69d9aaec7: Waiting
2025-12-04T08:58:39.1562899Z ff755a4ddad7: Pulling fs layer
2025-12-04T08:58:39.1563079Z e13ed7c7e473: Waiting
2025-12-04T08:58:39.1563230Z 2fa92dc5885e: Waiting
2025-12-04T08:58:39.1563394Z 09eb41bdf42d: Pulling fs layer
2025-12-04T08:58:39.1563580Z 11ede4d59e93: Pulling fs layer
2025-12-04T08:58:39.1563759Z ff755a4ddad7: Waiting
2025-12-04T08:58:39.1563913Z 75bd83b989a4: Waiting
2025-12-04T08:58:39.1564080Z 1283cd8f801a: Pulling fs layer
2025-12-04T08:58:39.1564258Z 024fa855425f: Pulling fs layer
2025-12-04T08:58:39.1564452Z 2b85eafbd92a: Waiting
2025-12-04T08:58:39.1564610Z 024fa855425f: Waiting
2025-12-04T08:58:39.1564811Z 1283cd8f801a: Waiting
2025-12-04T08:58:39.1564972Z 303e6747a62e: Pulling fs layer
2025-12-04T08:58:39.1565155Z 3017cdf4838b: Pulling fs layer
2025-12-04T08:58:39.1565334Z 6b6cd1c358e8: Pulling fs layer
2025-12-04T08:58:39.1565509Z 303e6747a62e: Waiting
2025-12-04T08:58:39.1565675Z b2dd04501124: Pulling fs layer
2025-12-04T08:58:39.1565852Z 3017cdf4838b: Waiting
2025-12-04T08:58:39.1566209Z 6b6cd1c358e8: Waiting
2025-12-04T08:58:39.1566377Z 11ede4d59e93: Waiting
2025-12-04T08:58:39.1566540Z 55adc51fe589: Pulling fs layer
2025-12-04T08:58:39.1566730Z 09eb41bdf42d: Waiting
2025-12-04T08:58:39.1566885Z b2dd04501124: Waiting
2025-12-04T08:58:39.1567047Z a43ca0e4b837: Pulling fs layer
2025-12-04T08:58:39.1567234Z b7212f17fd14: Pulling fs layer
2025-12-04T08:58:39.1567411Z a43ca0e4b837: Waiting
2025-12-04T08:58:39.1567574Z 083e42cac090: Pulling fs layer
2025-12-04T08:58:39.1567745Z b7212f17fd14: Waiting
2025-12-04T08:58:39.1567903Z 083e42cac090: Waiting
2025-12-04T08:58:39.1568075Z 55adc51fe589: Waiting
2025-12-04T08:58:39.1568236Z 0a00b784a4aa: Pulling fs layer
2025-12-04T08:58:39.1568532Z c6173c779f7b: Pulling fs layer
2025-12-04T08:58:39.1568720Z ed3d1e3387b9: Pulling fs layer
2025-12-04T08:58:39.1568908Z b29343478586: Pulling fs layer
2025-12-04T08:58:39.1569128Z c6173c779f7b: Waiting
2025-12-04T08:58:39.1569297Z ed3d1e3387b9: Waiting
2025-12-04T08:58:39.1569510Z 0a00b784a4aa: Waiting
2025-12-04T08:58:39.1569672Z c6f0520487fb: Pulling fs layer
2025-12-04T08:58:39.1569857Z 148171691cd4: Pulling fs layer
2025-12-04T08:58:39.1570099Z 2c666d30ed77: Pulling fs layer
2025-12-04T08:58:39.1570286Z 5d8d3a0a98e0: Pulling fs layer
2025-12-04T08:58:39.1570466Z b29343478586: Waiting
2025-12-04T08:58:39.1570725Z 2c666d30ed77: Waiting
2025-12-04T08:58:39.1570890Z b06bafce9e81: Pulling fs layer
2025-12-04T08:58:39.1571086Z 5d8d3a0a98e0: Waiting
2025-12-04T08:58:39.1571251Z 148171691cd4: Waiting
2025-12-04T08:58:39.1571405Z b06bafce9e81: Waiting
2025-12-04T08:58:39.1571569Z 15e0d7e4590d: Pulling fs layer
2025-12-04T08:58:39.1571753Z c6f0520487fb: Waiting
2025-12-04T08:58:39.1571911Z a514bd1add31: Pulling fs layer
2025-12-04T08:58:39.1572096Z a514bd1add31: Waiting
2025-12-04T08:58:39.1572258Z 57b84ee60002: Pulling fs layer
2025-12-04T08:58:39.1572431Z 15e0d7e4590d: Waiting
2025-12-04T08:58:39.1572578Z 57b84ee60002: Waiting
2025-12-04T08:58:39.1572738Z b8babeff6d81: Pulling fs layer
2025-12-04T08:58:39.1572927Z 83779ddf6a85: Pulling fs layer
2025-12-04T08:58:39.1573101Z b8babeff6d81: Waiting
2025-12-04T08:58:39.1573263Z 8b7620c0d736: Pulling fs layer
2025-12-04T08:58:39.1573450Z 3bcfa090e4ef: Pulling fs layer
2025-12-04T08:58:39.1573623Z 83779ddf6a85: Waiting
2025-12-04T08:58:39.1573787Z eb0504ec4d92: Pulling fs layer
2025-12-04T08:58:39.1573972Z 8b7620c0d736: Waiting
2025-12-04T08:58:39.1574141Z 15d0fec09d7b: Pulling fs layer
2025-12-04T08:58:39.1574330Z cca81fcc62a9: Pulling fs layer
2025-12-04T08:58:39.1574511Z 3bcfa090e4ef: Waiting
2025-12-04T08:58:39.1574665Z 15d0fec09d7b: Waiting
2025-12-04T08:58:39.1574827Z eb0504ec4d92: Waiting
2025-12-04T08:58:39.1574991Z b0b8f9b5c6ab: Pulling fs layer
2025-12-04T08:58:39.1575164Z cca81fcc62a9: Waiting
2025-12-04T08:58:39.1575330Z 0606ca4d47a8: Pulling fs layer
2025-12-04T08:58:39.1575509Z b0b8f9b5c6ab: Waiting
2025-12-04T08:58:39.1575673Z 2f80a4e1b3b9: Pulling fs layer
2025-12-04T08:58:39.1575855Z 35c916fb1bd0: Pulling fs layer
2025-12-04T08:58:39.1576052Z 195537b7dafc: Pulling fs layer
2025-12-04T08:58:39.1576236Z 0606ca4d47a8: Waiting
2025-12-04T08:58:39.1576386Z 2f80a4e1b3b9: Waiting
2025-12-04T08:58:39.1576552Z dc454fd3967e: Pulling fs layer
2025-12-04T08:58:39.1576730Z 35c916fb1bd0: Waiting
2025-12-04T08:58:39.1576947Z 701b34f115fa: Pulling fs layer
2025-12-04T08:58:39.1577133Z dc454fd3967e: Waiting
2025-12-04T08:58:39.1588331Z 39cefc00ffed: Pulling fs layer
2025-12-04T08:58:39.1588560Z 701b34f115fa: Waiting
2025-12-04T08:58:39.1588759Z 6ae51eb61a32: Pulling fs layer
2025-12-04T08:58:39.1588971Z 1fd5341e66df: Pulling fs layer
2025-12-04T08:58:39.1589160Z 72a7c87e35e4: Pulling fs layer
2025-12-04T08:58:39.1589351Z ec36862ac98e: Pulling fs layer
2025-12-04T08:58:39.1589524Z 1fd5341e66df: Waiting
2025-12-04T08:58:39.1589696Z 39cefc00ffed: Waiting
2025-12-04T08:58:39.1589858Z 6ae51eb61a32: Waiting
2025-12-04T08:58:39.1590004Z 72a7c87e35e4: Waiting
2025-12-04T08:58:39.1590167Z 05ddbf246e8a: Pulling fs layer
2025-12-04T08:58:39.1590350Z ec36862ac98e: Waiting
2025-12-04T08:58:39.1590681Z 05ddbf246e8a: Waiting
2025-12-04T08:58:39.2277660Z 0678d56345c9: Verifying Checksum
2025-12-04T08:58:39.2278318Z 0678d56345c9: Download complete
2025-12-04T08:58:39.3260699Z 086b1df51ac1: Download complete
2025-12-04T08:58:39.4145522Z fe8a7b64bf98: Download complete
2025-12-04T08:58:39.4974398Z 7680723e9a57: Verifying Checksum
2025-12-04T08:58:39.4974680Z 7680723e9a57: Download complete
2025-12-04T08:58:39.5123425Z 63e5bc7682b8: Download complete
2025-12-04T08:58:39.5613885Z 9c5027aeeb4e: Verifying Checksum
2025-12-04T08:58:39.5614244Z 9c5027aeeb4e: Download complete
2025-12-04T08:58:39.6031199Z 9a5652110360: Verifying Checksum
2025-12-04T08:58:39.6031472Z 9a5652110360: Download complete
2025-12-04T08:58:39.6640839Z a86faaa7dbdd: Verifying Checksum
2025-12-04T08:58:39.6641343Z a86faaa7dbdd: Download complete
2025-12-04T08:58:39.7477925Z fb7848686804: Verifying Checksum
2025-12-04T08:58:39.7478340Z fb7848686804: Download complete
2025-12-04T08:58:39.8344466Z 3541df015cdb: Download complete
2025-12-04T08:58:39.9083486Z 79dc80f426b2: Verifying Checksum
2025-12-04T08:58:39.9083868Z 79dc80f426b2: Download complete
2025-12-04T08:58:40.3790113Z 63e5bc7682b8: Pull complete
2025-12-04T08:58:40.4009085Z 0678d56345c9: Pull complete
2025-12-04T08:58:40.7043607Z 375c4427e914: Verifying Checksum
2025-12-04T08:58:40.7044264Z 375c4427e914: Download complete
2025-12-04T08:58:40.7111367Z 4f4fb700ef54: Verifying Checksum
2025-12-04T08:58:40.7111658Z 4f4fb700ef54: Download complete
2025-12-04T08:58:40.7843256Z 549db4d6c618: Download complete
2025-12-04T08:58:40.8465371Z 5c63528cb580: Verifying Checksum
2025-12-04T08:58:40.9213050Z 5c63528cb580: Download complete
2025-12-04T08:58:40.9213428Z 75bd83b989a4: Verifying Checksum
2025-12-04T08:58:40.9213655Z 75bd83b989a4: Download complete
2025-12-04T08:58:41.0092324Z de6e78970f51: Verifying Checksum
2025-12-04T08:58:41.0092676Z de6e78970f51: Download complete
2025-12-04T08:58:41.0719420Z e13ed7c7e473: Verifying Checksum
2025-12-04T08:58:41.0719736Z e13ed7c7e473: Download complete
2025-12-04T08:58:41.1404202Z 6e2949bcb741: Download complete
2025-12-04T08:58:41.2099949Z 14d69d9aaec7: Download complete
2025-12-04T08:58:41.2953005Z 5c02769dd8e5: Download complete
2025-12-04T08:58:42.3430012Z 45f5c9ddfce7: Verifying Checksum
2025-12-04T08:58:42.3430471Z 45f5c9ddfce7: Download complete
2025-12-04T08:58:42.4204875Z 2fa92dc5885e: Verifying Checksum
2025-12-04T08:58:42.4205173Z 2fa92dc5885e: Download complete
2025-12-04T08:58:42.8284215Z 2b85eafbd92a: Verifying Checksum
2025-12-04T08:58:42.8284707Z 2b85eafbd92a: Download complete
2025-12-04T08:58:42.8892606Z ff755a4ddad7: Download complete
2025-12-04T08:58:42.9686090Z 09eb41bdf42d: Verifying Checksum
2025-12-04T08:58:42.9686615Z 09eb41bdf42d: Download complete
2025-12-04T08:58:47.6006796Z 11ede4d59e93: Verifying Checksum
2025-12-04T08:58:47.6007163Z 11ede4d59e93: Download complete
2025-12-04T08:58:47.6779977Z 1283cd8f801a: Verifying Checksum
2025-12-04T08:58:47.6780276Z 1283cd8f801a: Download complete
2025-12-04T08:58:47.7395022Z 024fa855425f: Verifying Checksum
2025-12-04T08:58:47.7395662Z 024fa855425f: Download complete
2025-12-04T08:58:47.8209123Z 303e6747a62e: Verifying Checksum
2025-12-04T08:58:47.8209577Z 303e6747a62e: Download complete
2025-12-04T08:58:47.9116810Z 3017cdf4838b: Verifying Checksum
2025-12-04T08:58:47.9117582Z 3017cdf4838b: Download complete
2025-12-04T08:58:48.1443117Z 6b6cd1c358e8: Verifying Checksum
2025-12-04T08:58:48.1443432Z 6b6cd1c358e8: Download complete
2025-12-04T08:58:48.2261239Z b2dd04501124: Verifying Checksum
2025-12-04T08:58:48.2261537Z b2dd04501124: Download complete
2025-12-04T08:58:48.2774043Z 55adc51fe589: Verifying Checksum
2025-12-04T08:58:48.2774434Z 55adc51fe589: Download complete
2025-12-04T08:58:48.3603096Z a43ca0e4b837: Verifying Checksum
2025-12-04T08:58:48.3603437Z a43ca0e4b837: Download complete
2025-12-04T08:58:48.4568859Z b7212f17fd14: Download complete
2025-12-04T08:58:48.5670077Z 083e42cac090: Download complete
2025-12-04T08:58:48.6753811Z 0a00b784a4aa: Download complete
2025-12-04T08:58:48.7632409Z c6173c779f7b: Download complete
2025-12-04T08:58:49.3466875Z 45f5c9ddfce7: Pull complete
2025-12-04T08:58:49.3683444Z 086b1df51ac1: Pull complete
2025-12-04T08:58:49.3890907Z fe8a7b64bf98: Pull complete
2025-12-04T08:58:49.4099891Z 7680723e9a57: Pull complete
2025-12-04T08:58:49.4330392Z 9c5027aeeb4e: Pull complete
2025-12-04T08:58:49.4537981Z 9a5652110360: Pull complete
2025-12-04T08:58:50.2603346Z ed3d1e3387b9: Verifying Checksum
2025-12-04T08:58:50.2603654Z ed3d1e3387b9: Download complete
2025-12-04T08:58:50.3281025Z b29343478586: Download complete
2025-12-04T08:58:51.3847447Z 375c4427e914: Pull complete
2025-12-04T08:58:51.5367216Z a86faaa7dbdd: Pull complete
2025-12-04T08:58:51.7544843Z fb7848686804: Pull complete
2025-12-04T08:58:51.9246232Z 3541df015cdb: Pull complete
2025-12-04T08:58:52.0322390Z 79dc80f426b2: Pull complete
2025-12-04T08:58:53.4892508Z c6f0520487fb: Verifying Checksum
2025-12-04T08:58:53.4892872Z c6f0520487fb: Download complete
2025-12-04T08:59:11.7949970Z a13fcc1b90bb: Verifying Checksum
2025-12-04T08:59:11.7950271Z a13fcc1b90bb: Download complete
2025-12-04T08:59:11.8758532Z 2c666d30ed77: Download complete
2025-12-04T08:59:11.9464251Z 5d8d3a0a98e0: Verifying Checksum
2025-12-04T08:59:11.9464639Z 5d8d3a0a98e0: Download complete
2025-12-04T08:59:12.0379304Z b06bafce9e81: Download complete
2025-12-04T08:59:12.1083085Z 15e0d7e4590d: Download complete
2025-12-04T08:59:12.2582182Z 57b84ee60002: Verifying Checksum
2025-12-04T08:59:12.2582573Z 57b84ee60002: Download complete
2025-12-04T08:59:12.3395244Z b8babeff6d81: Verifying Checksum
2025-12-04T08:59:12.3395570Z b8babeff6d81: Download complete
2025-12-04T08:59:12.4512261Z 83779ddf6a85: Verifying Checksum
2025-12-04T08:59:12.4512563Z 83779ddf6a85: Download complete
2025-12-04T08:59:12.5199550Z 8b7620c0d736: Verifying Checksum
2025-12-04T08:59:12.5200074Z 8b7620c0d736: Download complete
2025-12-04T08:59:12.5847812Z 3bcfa090e4ef: Verifying Checksum
2025-12-04T08:59:12.5848127Z 3bcfa090e4ef: Download complete
2025-12-04T08:59:12.6578919Z eb0504ec4d92: Download complete
2025-12-04T08:59:12.7354960Z 15d0fec09d7b: Verifying Checksum
2025-12-04T08:59:12.7355419Z 15d0fec09d7b: Download complete
2025-12-04T08:59:12.8002885Z cca81fcc62a9: Verifying Checksum
2025-12-04T08:59:12.8003333Z cca81fcc62a9: Download complete
2025-12-04T08:59:12.8967342Z b0b8f9b5c6ab: Verifying Checksum
2025-12-04T08:59:12.8967771Z b0b8f9b5c6ab: Download complete
2025-12-04T08:59:12.9666897Z 0606ca4d47a8: Download complete
2025-12-04T08:59:13.0357609Z 2f80a4e1b3b9: Download complete
2025-12-04T08:59:13.0957861Z 35c916fb1bd0: Verifying Checksum
2025-12-04T08:59:13.0958174Z 35c916fb1bd0: Download complete
2025-12-04T08:59:15.1149902Z 195537b7dafc: Verifying Checksum
2025-12-04T08:59:15.1150304Z 195537b7dafc: Download complete
2025-12-04T08:59:15.1867048Z dc454fd3967e: Verifying Checksum
2025-12-04T08:59:15.1867595Z dc454fd3967e: Download complete
2025-12-04T08:59:15.2735184Z 701b34f115fa: Verifying Checksum
2025-12-04T08:59:15.2735783Z 701b34f115fa: Download complete
2025-12-04T08:59:15.3564127Z 39cefc00ffed: Verifying Checksum
2025-12-04T08:59:15.3564707Z 39cefc00ffed: Download complete
2025-12-04T08:59:15.4412206Z 6ae51eb61a32: Verifying Checksum
2025-12-04T08:59:15.4412603Z 6ae51eb61a32: Download complete
2025-12-04T08:59:15.5316543Z 1fd5341e66df: Verifying Checksum
2025-12-04T08:59:15.5317203Z 1fd5341e66df: Download complete
2025-12-04T08:59:15.7195394Z 72a7c87e35e4: Download complete
2025-12-04T08:59:15.7876759Z ec36862ac98e: Verifying Checksum
2025-12-04T08:59:15.7877288Z ec36862ac98e: Download complete
2025-12-04T08:59:16.3794642Z 05ddbf246e8a: Verifying Checksum
2025-12-04T08:59:16.3794956Z 05ddbf246e8a: Download complete
2025-12-04T08:59:24.1228190Z 148171691cd4: Verifying Checksum
2025-12-04T08:59:24.1228624Z 148171691cd4: Download complete
2025-12-04T09:00:02.0273717Z 35041ce524ac: Verifying Checksum
2025-12-04T09:00:02.0274026Z 35041ce524ac: Download complete
2025-12-04T09:00:32.9743998Z a13fcc1b90bb: Pull complete
2025-12-04T09:00:33.1767352Z 4f4fb700ef54: Pull complete
2025-12-04T09:00:33.3351169Z 549db4d6c618: Pull complete
2025-12-04T09:00:33.4766145Z 5c63528cb580: Pull complete
2025-12-04T09:00:33.5536632Z 75bd83b989a4: Pull complete
2025-12-04T09:00:33.7630396Z de6e78970f51: Pull complete
2025-12-04T09:00:33.9828367Z e13ed7c7e473: Pull complete
2025-12-04T09:00:34.1997282Z 6e2949bcb741: Pull complete
2025-12-04T09:00:34.4059058Z 14d69d9aaec7: Pull complete
2025-12-04T09:00:34.6100826Z 5c02769dd8e5: Pull complete
2025-12-04T09:02:07.5438834Z 35041ce524ac: Pull complete
2025-12-04T09:02:07.7550595Z 2fa92dc5885e: Pull complete
2025-12-04T09:02:08.3243892Z 2b85eafbd92a: Pull complete
2025-12-04T09:02:08.5458917Z ff755a4ddad7: Pull complete
2025-12-04T09:02:08.5889901Z 09eb41bdf42d: Pull complete
2025-12-04T09:02:15.0556669Z 11ede4d59e93: Pull complete
2025-12-04T09:02:15.2714728Z 1283cd8f801a: Pull complete
2025-12-04T09:02:15.4910613Z 024fa855425f: Pull complete
2025-12-04T09:02:15.9371453Z 303e6747a62e: Pull complete
2025-12-04T09:02:16.1571346Z 3017cdf4838b: Pull complete
2025-12-04T09:02:16.5432220Z 6b6cd1c358e8: Pull complete
2025-12-04T09:02:16.7674686Z b2dd04501124: Pull complete
2025-12-04T09:02:16.9732119Z 55adc51fe589: Pull complete
2025-12-04T09:02:17.3881846Z a43ca0e4b837: Pull complete
2025-12-04T09:02:17.6029733Z b7212f17fd14: Pull complete
2025-12-04T09:02:17.8179863Z 083e42cac090: Pull complete
2025-12-04T09:02:18.2411123Z 0a00b784a4aa: Pull complete
2025-12-04T09:02:18.4620196Z c6173c779f7b: Pull complete
2025-12-04T09:02:21.2072356Z ed3d1e3387b9: Pull complete
2025-12-04T09:02:21.4280437Z b29343478586: Pull complete
2025-12-04T09:02:22.4863175Z c6f0520487fb: Pull complete
2025-12-04T09:03:06.0565910Z 148171691cd4: Pull complete
2025-12-04T09:03:06.2947333Z 2c666d30ed77: Pull complete
2025-12-04T09:03:06.4990987Z 5d8d3a0a98e0: Pull complete
2025-12-04T09:03:06.8366147Z b06bafce9e81: Pull complete
2025-12-04T09:03:07.1562059Z 15e0d7e4590d: Pull complete
2025-12-04T09:03:07.2750898Z a514bd1add31: Pull complete
2025-12-04T09:03:07.4056939Z 57b84ee60002: Pull complete
2025-12-04T09:03:07.6220025Z b8babeff6d81: Pull complete
2025-12-04T09:03:07.7959365Z 83779ddf6a85: Pull complete
2025-12-04T09:03:08.1557494Z 8b7620c0d736: Pull complete
2025-12-04T09:03:08.4596249Z 3bcfa090e4ef: Pull complete
2025-12-04T09:03:08.6464626Z eb0504ec4d92: Pull complete
2025-12-04T09:03:08.9987675Z 15d0fec09d7b: Pull complete
2025-12-04T09:03:09.2045198Z cca81fcc62a9: Pull complete
2025-12-04T09:03:09.5467074Z b0b8f9b5c6ab: Pull complete
2025-12-04T09:03:09.7625940Z 0606ca4d47a8: Pull complete
2025-12-04T09:03:10.1242307Z 2f80a4e1b3b9: Pull complete
2025-12-04T09:03:10.3285623Z 35c916fb1bd0: Pull complete
2025-12-04T09:03:15.5435198Z 195537b7dafc: Pull complete
2025-12-04T09:03:15.7492096Z dc454fd3967e: Pull complete
2025-12-04T09:03:15.9742809Z 701b34f115fa: Pull complete
2025-12-04T09:03:16.1993277Z 39cefc00ffed: Pull complete
2025-12-04T09:03:16.4171098Z 6ae51eb61a32: Pull complete
2025-12-04T09:03:16.6549940Z 1fd5341e66df: Pull complete
2025-12-04T09:03:18.0347784Z 72a7c87e35e4: Pull complete
2025-12-04T09:03:18.2387536Z ec36862ac98e: Pull complete
2025-12-04T09:03:19.4699795Z 05ddbf246e8a: Pull complete
2025-12-04T09:03:19.7312394Z Digest: sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97
2025-12-04T09:03:19.7705816Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:03:19.7894905Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:03:19.7953823Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:03:19.7954561Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:19.7963587Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:19.7963867Z env:
2025-12-04T09:03:19.7964029Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:19.7964211Z ##[endgroup]
2025-12-04T09:03:19.8101813Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main
2025-12-04T09:03:19.8102156Z with:
2025-12-04T09:03:19.8102346Z   driver-version: 580.82.07
2025-12-04T09:03:19.8102552Z env:
2025-12-04T09:03:19.8102720Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:19.8102913Z ##[endgroup]
2025-12-04T09:03:19.8199522Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:03:19.8200295Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:19.8207647Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:19.8207925Z env:
2025-12-04T09:03:19.8208088Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:19.8208269Z ##[endgroup]
2025-12-04T09:03:19.8317286Z ##[group]Run set -euo pipefail
2025-12-04T09:03:19.8317639Z [36;1mset -euo pipefail[0m
2025-12-04T09:03:19.8317903Z [36;1m[0m
2025-12-04T09:03:19.8318092Z [36;1mhas_gpu=false[0m
2025-12-04T09:03:19.8318319Z [36;1mdevices=""[0m
2025-12-04T09:03:19.8318804Z [36;1m[0m
2025-12-04T09:03:19.8319052Z [36;1mif command -v nvidia-smi >/dev/null 2>&1; then[0m
2025-12-04T09:03:19.8319469Z [36;1m  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:03:19.8319808Z [36;1m    has_gpu=true[0m
2025-12-04T09:03:19.8320159Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:03:19.8320432Z [36;1m  fi[0m
2025-12-04T09:03:19.8320625Z [36;1mfi[0m
2025-12-04T09:03:19.8320778Z [36;1m[0m
2025-12-04T09:03:19.8320933Z [36;1mif [ "$has_gpu" = false ]; then[0m
2025-12-04T09:03:19.8321203Z [36;1m  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:03:19.8321469Z [36;1m    has_gpu=true[0m
2025-12-04T09:03:19.8321669Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:03:19.8321881Z [36;1m  fi[0m
2025-12-04T09:03:19.8322026Z [36;1mfi[0m
2025-12-04T09:03:19.8322177Z [36;1m[0m
2025-12-04T09:03:19.8322407Z [36;1mif [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then[0m
2025-12-04T09:03:19.8322777Z [36;1m  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:03:19.8323063Z [36;1m    has_gpu=true[0m
2025-12-04T09:03:19.8323264Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:03:19.8323482Z [36;1m  fi[0m
2025-12-04T09:03:19.8323630Z [36;1mfi[0m
2025-12-04T09:03:19.8323767Z [36;1m[0m
2025-12-04T09:03:19.8323977Z [36;1mprintf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:19.8324347Z [36;1mprintf 'DETECTED_DEVICES<<EOF\n%s\nEOF\n' "$devices" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:03:19.8331362Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:19.8331630Z env:
2025-12-04T09:03:19.8331784Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:19.8331967Z ##[endgroup]
2025-12-04T09:03:21.4649636Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then
2025-12-04T09:03:21.4650026Z [36;1mif [ "${HAS_NVIDIA}" = "true" ]; then[0m
2025-12-04T09:03:21.4650368Z [36;1m  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}"[0m
2025-12-04T09:03:21.4650847Z [36;1m  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}"[0m
2025-12-04T09:03:21.4651193Z [36;1melse[0m
2025-12-04T09:03:21.4651393Z [36;1m  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}"[0m
2025-12-04T09:03:21.4651647Z [36;1mfi[0m
2025-12-04T09:03:21.4659590Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:03:21.4660045Z env:
2025-12-04T09:03:21.4660226Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:21.4660418Z   HAS_NVIDIA: true
2025-12-04T09:03:21.4660583Z ##[endgroup]
2025-12-04T09:03:21.4740019Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
2025-12-04T09:03:21.4740323Z with:
2025-12-04T09:03:21.4740473Z   timeout_minutes: 10
2025-12-04T09:03:21.4740649Z   max_attempts: 3
2025-12-04T09:03:21.4759686Z   command: # Is it disgusting to have a full shell script here in this github action? Sure
# But is it the best way to make it so that this action relies on nothing else? Absolutely
set -eou pipefail

DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID)
DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"

install_nvidia_docker2_amzn2() {
    (
        set -x
        # Needed for yum-config-manager
        sudo yum install -y yum-utils
        if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then
          YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo"
        else
          # Amazon Linux 2
          YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"
        fi

        sudo yum-config-manager --add-repo "${YUM_REPO_URL}"
        sudo yum install -y \
          nvidia-container-toolkit-1.17.8 \
          libnvidia-container-tools-1.17.8 \
          libnvidia-container1-1.17.8 \
          nvidia-container-toolkit-base-1.17.8
        sudo systemctl restart docker
    )
}

install_nvidia_docker2_ubuntu20() {
    (
        set -x
        # Install nvidia-driver package if not installed
        status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)"
        if [ ! $? = 0 ] || [ ! "$status" = installed ]; then
          sudo apt-get install -y nvidia-container-toolkit-1.17.8
          sudo systemctl restart docker
        fi
    )
}

pre_install_nvidia_driver_amzn2() {
    (
        # Purge any nvidia driver installed from RHEL repo
        sudo yum remove -y nvidia-driver-latest-dkms
    )
}

install_nvidia_driver_common() {
    (
        # Try to gather more information about the runner and its existing NVIDIA driver if any
        echo "Before installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        HAS_NVIDIA_DRIVER=0
        # Check if NVIDIA driver has already been installed
        if [ -x "$(command -v nvidia-smi)" ]; then
            set +e
            # The driver exists, check its version next. Also check only the first GPU if there are more than one of them
            # so that the same driver version is not print over multiple lines
            INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
            NVIDIA_SMI_STATUS=$?

            if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing"
            elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing"

                # Turn off persistent mode so that the installation script can unload the kernel module
                sudo killall nvidia-persistenced || true
            else
                HAS_NVIDIA_DRIVER=1
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation"
            fi
            set -e
        fi

        if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then
            # CAUTION: this may need to be updated in future
            if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then
                  sudo yum groupinstall -y "Development Tools"
                  # ensure our kernel install is the same as our underlying kernel,
                  # groupinstall "Development Tools" has a habit of mismatching kernel headers
                  sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
                  sudo modprobe backlight
            fi
            sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

            set +e
            sudo /bin/bash /tmp/nvidia_driver -s --no-drm
            NVIDIA_INSTALLATION_STATUS=$?

            RESET_GPU=0
            if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then
                sudo cat /var/log/nvidia-installer.log
                # Fail to install NVIDIA driver, try to reset the GPU
                RESET_GPU=1
            elif [ -x "$(command -v nvidia-smi)" ]; then
                # Check again if nvidia-smi works even if the driver installation completes successfully
                INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
                NVIDIA_SMI_STATUS=$?

                if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                    RESET_GPU=1
                fi
            fi

            if [ "$RESET_GPU" -eq 1 ]; then
                NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1)
                # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this
                # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388
                for PCI_ID in $NVIDIA_DEVICES; do
                    DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable)

                    echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)"
                    # This requires sudo permission of course
                    echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset
                    sleep 1
                done
            fi

            sudo rm -fv /tmp/nvidia_driver
            set -e
        fi
    )
}

post_install_nvidia_driver_common() {
    (
        sudo modprobe nvidia || true
        echo "After installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        (
            set +e

            nvidia-smi
            # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in
            # the case where the driver has already crashed as it still can get the driver version
            # and some basic information like the bus ID.  However, the rest of the information
            # would be missing (ERR!), for example:
            #
            # +-----------------------------------------------------------------------------+
            # | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
            # |-------------------------------+----------------------+----------------------+
            # | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
            # | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
            # |                               |                      |               MIG M. |
            # |===============================+======================+======================|
            # |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |
            # |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |
            # |                               |                      |                 ERR! |
            # +-------------------------------+----------------------+----------------------+
            #
            # +-----------------------------------------------------------------------------+
            # | Processes:                                                                  |
            # |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
            # |        ID   ID                                                   Usage      |
            # |=============================================================================|
            # +-----------------------------------------------------------------------------+
            #
            # This should be reported as a failure instead as it will guarantee to fail when
            # Docker tries to run with --gpus all
            #
            # So, the correct check here is to query one of the missing piece of info like
            # GPU name, so that the command can fail accordingly
            nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
            NVIDIA_SMI_STATUS=$?

            # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285
            if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then
                echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}"
            else
                echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}"
                exit ${NVIDIA_SMI_STATUS}
            fi
            set -e
        )
    )
}

install_nvidia_driver_amzn2() {
    (
        set -x
        pre_install_nvidia_driver_amzn2
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

install_nvidia_driver_ubuntu20() {
    (
        set -x
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_driver_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_driver_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Install container toolkit based on distribution
echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_docker2_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_docker2_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with
# more than one GPUs. This just needs to be run once. The command fails
# on subsequent runs and complains that the mode is already on, but that's
# ok
sudo nvidia-persistenced || true
# This should show persistence mode ON
nvidia-smi

# check if the container-toolkit is correctly installed and CUDA is available inside a container
docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi

2025-12-04T09:03:21.4778737Z   retry_wait_seconds: 10
2025-12-04T09:03:21.4778934Z   polling_interval_seconds: 1
2025-12-04T09:03:21.4779137Z   warning_on_retry: true
2025-12-04T09:03:21.4779330Z   continue_on_error: false
2025-12-04T09:03:21.4779509Z env:
2025-12-04T09:03:21.4779654Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:03:21.4779842Z   HAS_NVIDIA_GPU: true
2025-12-04T09:03:21.4780062Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:03:21.4780317Z   DRIVER_VERSION: 580.82.07
2025-12-04T09:03:21.4780509Z ##[endgroup]
2025-12-04T09:03:21.5471290Z == Installing nvidia driver NVIDIA-Linux-x86_64-580.82.07.run ==
2025-12-04T09:03:21.5473076Z + pre_install_nvidia_driver_amzn2
2025-12-04T09:03:21.5474535Z + sudo yum remove -y nvidia-driver-latest-dkms
2025-12-04T09:03:22.1240532Z No match for argument: nvidia-driver-latest-dkms
2025-12-04T09:03:22.1241343Z No packages marked for removal.
2025-12-04T09:03:22.1296202Z Dependencies resolved.
2025-12-04T09:03:22.1304756Z Nothing to do.
2025-12-04T09:03:22.1305743Z Complete!
2025-12-04T09:03:22.2014362Z + install_nvidia_driver_common
2025-12-04T09:03:22.2020399Z + echo 'Before installing NVIDIA driver'
2025-12-04T09:03:22.2022141Z + lspci
2025-12-04T09:03:22.2024696Z Before installing NVIDIA driver
2025-12-04T09:03:22.2825371Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:03:22.2826213Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:03:22.2826871Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:03:22.2827542Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:03:22.2828422Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:03:22.2828898Z 01:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2829168Z 02:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2829646Z 03:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2830109Z 03:00.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2830575Z 03:00.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2831273Z 03:00.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2831562Z 03:00.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2831818Z 03:00.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2832070Z 03:00.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2832312Z 03:00.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2832551Z 03:01.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2832785Z 03:01.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2833023Z 03:01.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2833274Z 03:01.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2833515Z 03:01.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2833749Z 03:01.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2833985Z 03:01.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2834223Z 03:01.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2834462Z 03:02.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2834708Z 03:02.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2835101Z 03:02.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2835732Z 03:02.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2836109Z 03:02.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2836461Z 03:02.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2836915Z 03:02.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2837238Z 03:02.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2837480Z 03:03.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2837721Z 03:03.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2837958Z 03:03.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2838197Z 03:03.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2838438Z 03:03.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2838681Z 03:03.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2838921Z 03:03.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2839163Z 03:03.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2839408Z 24:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2839654Z 25:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2840034Z 26:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2840504Z 26:00.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2840956Z 26:00.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2841407Z 26:00.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2841868Z 26:00.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2842211Z 26:00.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2842460Z 26:00.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2842704Z 26:00.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2842951Z 26:01.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2843291Z 27:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:03:22.2843839Z 30:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2844282Z 31:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2844647Z 32:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2844976Z 33:00.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:03:22.2845310Z 34:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:22.2845600Z 35:00.0 3D controller: NVIDIA Corporation AD104GL [L4] (rev a1)
2025-12-04T09:03:22.2846035Z + lsmod
2025-12-04T09:03:22.2870261Z Module                  Size  Used by
2025-12-04T09:03:22.2870585Z nvidia_uvm           1925120  0
2025-12-04T09:03:22.2870866Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:03:22.2871375Z drm                   602112  1 nvidia
2025-12-04T09:03:22.2871910Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:03:22.2872465Z backlight              24576  1 drm
2025-12-04T09:03:22.2872775Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:03:22.2873053Z xt_conntrack           16384  1
2025-12-04T09:03:22.2873302Z nft_chain_nat          16384  3
2025-12-04T09:03:22.2873528Z xt_MASQUERADE          20480  1
2025-12-04T09:03:22.2873808Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:03:22.2874118Z nf_conntrack_netlink    57344  0
2025-12-04T09:03:22.2874482Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:03:22.2874891Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:03:22.2875172Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:03:22.2875436Z xfrm_user              57344  1
2025-12-04T09:03:22.2875699Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:03:22.2875968Z xt_addrtype            16384  2
2025-12-04T09:03:22.2876190Z nft_compat             20480  4
2025-12-04T09:03:22.2876469Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:03:22.2876857Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:03:22.2877328Z br_netfilter           36864  0
2025-12-04T09:03:22.2877578Z bridge                323584  1 br_netfilter
2025-12-04T09:03:22.2877851Z stp                    16384  1 bridge
2025-12-04T09:03:22.2878114Z llc                    16384  2 bridge,stp
2025-12-04T09:03:22.2878369Z overlay               167936  0
2025-12-04T09:03:22.2878589Z tls                   139264  0
2025-12-04T09:03:22.2878805Z nls_ascii              16384  1
2025-12-04T09:03:22.2879022Z nls_cp437              20480  1
2025-12-04T09:03:22.2879239Z vfat                   24576  1
2025-12-04T09:03:22.2879461Z fat                    86016  1 vfat
2025-12-04T09:03:22.2879694Z sunrpc                700416  1
2025-12-04T09:03:22.2880030Z ena                   184320  0
2025-12-04T09:03:22.2880259Z i8042                  45056  0
2025-12-04T09:03:22.2880491Z serio                  28672  3 i8042
2025-12-04T09:03:22.2880733Z ghash_clmulni_intel    16384  0
2025-12-04T09:03:22.2880969Z button                 24576  0
2025-12-04T09:03:22.2881193Z sch_fq_codel           20480  9
2025-12-04T09:03:22.2881419Z dm_mod                188416  0
2025-12-04T09:03:22.2881614Z fuse                  184320  1
2025-12-04T09:03:22.2881794Z loop                   36864  0
2025-12-04T09:03:22.2881968Z configfs               57344  1
2025-12-04T09:03:22.2882150Z dmi_sysfs              20480  0
2025-12-04T09:03:22.2882332Z crc32_pclmul           16384  0
2025-12-04T09:03:22.2882509Z crc32c_intel           24576  0
2025-12-04T09:03:22.2882694Z efivarfs               24576  1
2025-12-04T09:03:22.2882884Z + modinfo nvidia
2025-12-04T09:03:22.2888734Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:03:22.2889144Z import_ns:      DMA_BUF
2025-12-04T09:03:22.2889356Z alias:          char-major-195-*
2025-12-04T09:03:22.2889729Z version:        580.82.07
2025-12-04T09:03:22.2890076Z supported:      external
2025-12-04T09:03:22.2890395Z license:        Dual MIT/GPL
2025-12-04T09:03:22.2890784Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:03:22.2891256Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:03:22.2891658Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:03:22.2891933Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:03:22.2892206Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:03:22.2892472Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:03:22.2892740Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:03:22.2892998Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:03:22.2893320Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:03:22.2893778Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:03:22.2894222Z depends:        i2c-core,drm
2025-12-04T09:03:22.2894520Z retpoline:      Y
2025-12-04T09:03:22.2894832Z name:           nvidia
2025-12-04T09:03:22.2895127Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:03:22.2895492Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:03:22.2895830Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:03:22.2896142Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:03:22.2896370Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:03:22.2896593Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:03:22.2896818Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:03:22.2897036Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:03:22.2897258Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:03:22.2897523Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:03:22.2897840Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:03:22.2898316Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:03:22.2898768Z parm:           NVreg_EnableMSI:int
2025-12-04T09:03:22.2899198Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:03:22.2899714Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:03:22.2900172Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:03:22.2900460Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:03:22.2900765Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:22.2901068Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:03:22.2901373Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:22.2901781Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:03:22.2902248Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:03:22.2902738Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:03:22.2903035Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:03:22.2903286Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:03:22.2903530Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:03:22.2903781Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:03:22.2904023Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:03:22.2904262Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:03:22.2904514Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:03:22.2904780Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:03:22.2905044Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:03:22.2905303Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:03:22.2905545Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:03:22.2905798Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:03:22.2906049Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:03:22.2906299Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:03:22.2906547Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:03:22.2906789Z parm:           NVreg_RmMsg:charp
2025-12-04T09:03:22.2907003Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:03:22.2907255Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:03:22.2907497Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:03:22.2907720Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:03:22.2907957Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:03:22.2908216Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:03:22.2908467Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:03:22.2908700Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:03:22.2908949Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:03:22.2909189Z parm:           rm_firmware_active:charp
2025-12-04T09:03:22.2909420Z + HAS_NVIDIA_DRIVER=0
2025-12-04T09:03:22.2909605Z ++ command -v nvidia-smi
2025-12-04T09:03:22.2909799Z + '[' -x /usr/bin/nvidia-smi ']'
2025-12-04T09:03:22.2909983Z + set +e
2025-12-04T09:03:22.2910344Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2025-12-04T09:03:23.9130812Z + INSTALLED_DRIVER_VERSION=580.82.07
2025-12-04T09:03:23.9131128Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:03:23.9132097Z + '[' 0 -ne 0 ']'
2025-12-04T09:03:23.9132275Z + '[' 580.82.07 '!=' 580.82.07 ']'
2025-12-04T09:03:23.9132477Z + HAS_NVIDIA_DRIVER=1
2025-12-04T09:03:23.9132825Z + echo 'NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation'
2025-12-04T09:03:23.9133184Z + set -e
2025-12-04T09:03:23.9133331Z + '[' 1 -eq 0 ']'
2025-12-04T09:03:23.9133638Z NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation
2025-12-04T09:03:23.9134866Z + post_install_nvidia_driver_common
2025-12-04T09:03:23.9138199Z + sudo modprobe nvidia
2025-12-04T09:03:24.0406134Z + echo 'After installing NVIDIA driver'
2025-12-04T09:03:24.0406597Z + lspci
2025-12-04T09:03:24.0406810Z After installing NVIDIA driver
2025-12-04T09:03:24.0570725Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:03:24.0571287Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:03:24.0571830Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:03:24.0572638Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:03:24.0573092Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:03:24.0573517Z 01:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0573837Z 02:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0574279Z 03:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0574635Z 03:00.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0574931Z 03:00.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0575218Z 03:00.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0575515Z 03:00.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0575807Z 03:00.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0576098Z 03:00.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0576391Z 03:00.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0576690Z 03:01.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0576976Z 03:01.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0577285Z 03:01.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0577582Z 03:01.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0577870Z 03:01.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0578154Z 03:01.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0578444Z 03:01.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0578732Z 03:01.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0579021Z 03:02.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0579310Z 03:02.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0579603Z 03:02.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0579896Z 03:02.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0580192Z 03:02.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0580506Z 03:02.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0580798Z 03:02.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0581092Z 03:02.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0581387Z 03:03.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0581685Z 03:03.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0581943Z 03:03.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0582179Z 03:03.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0582430Z 03:03.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0582677Z 03:03.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0582921Z 03:03.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0583159Z 03:03.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0583604Z 24:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0583858Z 25:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0584108Z 26:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0584350Z 26:00.1 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0584587Z 26:00.2 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0584830Z 26:00.3 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0585076Z 26:00.4 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0585310Z 26:00.5 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0585551Z 26:00.6 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0585797Z 26:00.7 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0586039Z 26:01.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0586352Z 27:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:03:24.0586676Z 30:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0586922Z 31:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0587159Z 32:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0587565Z 33:00.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:03:24.0587889Z 34:00.0 PCI bridge: Amazon.com, Inc. Device 0200
2025-12-04T09:03:24.0588167Z 35:00.0 3D controller: NVIDIA Corporation AD104GL [L4] (rev a1)
2025-12-04T09:03:24.0588428Z + lsmod
2025-12-04T09:03:24.0603159Z Module                  Size  Used by
2025-12-04T09:03:24.0603579Z nvidia_uvm           1925120  0
2025-12-04T09:03:24.0603838Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:03:24.0604107Z drm                   602112  1 nvidia
2025-12-04T09:03:24.0604386Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:03:24.0604720Z backlight              24576  1 drm
2025-12-04T09:03:24.0605077Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:03:24.0605348Z xt_conntrack           16384  1
2025-12-04T09:03:24.0605590Z nft_chain_nat          16384  3
2025-12-04T09:03:24.0605821Z xt_MASQUERADE          20480  1
2025-12-04T09:03:24.0606088Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:03:24.0606390Z nf_conntrack_netlink    57344  0
2025-12-04T09:03:24.0606869Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:03:24.0607282Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:03:24.0607562Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:03:24.0607814Z xfrm_user              57344  1
2025-12-04T09:03:24.0608075Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:03:24.0608336Z xt_addrtype            16384  2
2025-12-04T09:03:24.0608556Z nft_compat             20480  4
2025-12-04T09:03:24.0608828Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:03:24.0609212Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:03:24.0609554Z br_netfilter           36864  0
2025-12-04T09:03:24.0609799Z bridge                323584  1 br_netfilter
2025-12-04T09:03:24.0610069Z stp                    16384  1 bridge
2025-12-04T09:03:24.0610331Z llc                    16384  2 bridge,stp
2025-12-04T09:03:24.0610577Z overlay               167936  0
2025-12-04T09:03:24.0610797Z tls                   139264  0
2025-12-04T09:03:24.0611019Z nls_ascii              16384  1
2025-12-04T09:03:24.0611232Z nls_cp437              20480  1
2025-12-04T09:03:24.0611446Z vfat                   24576  1
2025-12-04T09:03:24.0611670Z fat                    86016  1 vfat
2025-12-04T09:03:24.0611909Z sunrpc                700416  1
2025-12-04T09:03:24.0612087Z ena                   184320  0
2025-12-04T09:03:24.0612260Z i8042                  45056  0
2025-12-04T09:03:24.0612444Z serio                  28672  3 i8042
2025-12-04T09:03:24.0612638Z ghash_clmulni_intel    16384  0
2025-12-04T09:03:24.0612828Z button                 24576  0
2025-12-04T09:03:24.0613008Z sch_fq_codel           20480  9
2025-12-04T09:03:24.0613302Z dm_mod                188416  0
2025-12-04T09:03:24.0613484Z fuse                  184320  1
2025-12-04T09:03:24.0613668Z loop                   36864  0
2025-12-04T09:03:24.0613845Z configfs               57344  1
2025-12-04T09:03:24.0614024Z dmi_sysfs              20480  0
2025-12-04T09:03:24.0614202Z crc32_pclmul           16384  0
2025-12-04T09:03:24.0614374Z crc32c_intel           24576  0
2025-12-04T09:03:24.0614553Z efivarfs               24576  1
2025-12-04T09:03:24.0614729Z + modinfo nvidia
2025-12-04T09:03:24.0620299Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:03:24.0620680Z import_ns:      DMA_BUF
2025-12-04T09:03:24.0620867Z alias:          char-major-195-*
2025-12-04T09:03:24.0621067Z version:        580.82.07
2025-12-04T09:03:24.0621251Z supported:      external
2025-12-04T09:03:24.0621437Z license:        Dual MIT/GPL
2025-12-04T09:03:24.0621668Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:03:24.0621980Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:03:24.0622269Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:03:24.0622777Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:03:24.0623125Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:03:24.0623446Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:03:24.0623756Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:03:24.0624059Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:03:24.0624480Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:03:24.0624925Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:03:24.0625211Z depends:        i2c-core,drm
2025-12-04T09:03:24.0625451Z retpoline:      Y
2025-12-04T09:03:24.0625654Z name:           nvidia
2025-12-04T09:03:24.0625999Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:03:24.0626442Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:03:24.0626867Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:03:24.0627250Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:03:24.0627547Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:03:24.0627814Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:03:24.0628098Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:03:24.0628382Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:03:24.0628648Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:03:24.0628973Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:03:24.0629324Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:03:24.0629621Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:03:24.0629880Z parm:           NVreg_EnableMSI:int
2025-12-04T09:03:24.0630158Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:03:24.0630490Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:03:24.0630862Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:03:24.0631235Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:03:24.0631708Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:24.0632019Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:03:24.0632327Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:03:24.0632635Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:03:24.0632893Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:03:24.0633198Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:03:24.0633488Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:03:24.0633743Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:03:24.0633980Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:03:24.0634214Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:03:24.0634449Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:03:24.0634851Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:03:24.0635106Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:03:24.0635376Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:03:24.0635636Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:03:24.0635891Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:03:24.0636153Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:03:24.0636407Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:03:24.0636656Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:03:24.0636901Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:03:24.0637148Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:03:24.0637388Z parm:           NVreg_RmMsg:charp
2025-12-04T09:03:24.0637593Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:03:24.0637843Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:03:24.0638082Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:03:24.0638299Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:03:24.0638535Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:03:24.0638923Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:03:24.0639171Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:03:24.0639403Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:03:24.0639652Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:03:24.0640010Z parm:           rm_firmware_active:charp
2025-12-04T09:03:24.0640212Z + set +e
2025-12-04T09:03:24.0640358Z + nvidia-smi
2025-12-04T09:03:25.4899249Z Thu Dec  4 09:03:25 2025       
2025-12-04T09:03:25.4900013Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:03:25.4900976Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:03:25.4901936Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:03:25.4903294Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:03:25.4904425Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:03:25.4905290Z |                                         |                        |               MIG M. |
2025-12-04T09:03:25.4905948Z |=========================================+========================+======================|
2025-12-04T09:03:25.4969029Z |   0  NVIDIA L4                      Off |   00000000:35:00.0 Off |                    0 |
2025-12-04T09:03:25.4969450Z | N/A   35C    P0             30W /   72W |       0MiB /  23034MiB |      4%      Default |
2025-12-04T09:03:25.4969800Z |                                         |                        |                  N/A |
2025-12-04T09:03:25.4970156Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:03:25.4970650Z 
2025-12-04T09:03:25.4970835Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:03:25.4971228Z | Processes:                                                                              |
2025-12-04T09:03:25.4971662Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:03:25.4972032Z |        ID   ID                                                               Usage      |
2025-12-04T09:03:25.4972327Z |=========================================================================================|
2025-12-04T09:03:25.4974218Z |  No running processes found                                                             |
2025-12-04T09:03:25.4974659Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:03:25.8213139Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T09:03:27.2458346Z NVIDIA L4
2025-12-04T09:03:27.4269878Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:03:27.4270425Z + '[' 0 -eq 0 ']'
2025-12-04T09:03:27.4270640Z + echo 'INFO: Ignoring allowed status 0'
2025-12-04T09:03:27.4270859Z + set -e
2025-12-04T09:03:27.4271037Z INFO: Ignoring allowed status 0
2025-12-04T09:03:27.4277875Z == Installing nvidia container toolkit for amzn2023 ==
2025-12-04T09:03:27.4282076Z + sudo yum install -y yum-utils
2025-12-04T09:03:27.8082834Z Last metadata expiration check: 0:07:57 ago on Thu Dec  4 08:55:30 2025.
2025-12-04T09:03:27.8315376Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed.
2025-12-04T09:03:27.8743182Z Dependencies resolved.
2025-12-04T09:03:27.8972948Z Nothing to do.
2025-12-04T09:03:27.8973291Z Complete!
2025-12-04T09:03:27.9846329Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]]
2025-12-04T09:03:27.9846952Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:03:27.9847864Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:03:28.3546519Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:03:28.3945932Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8
2025-12-04T09:03:28.9132584Z nvidia-container-toolkit                         24 kB/s | 833  B     00:00    
2025-12-04T09:03:28.9792047Z Dependencies resolved.
2025-12-04T09:03:29.0017635Z ================================================================================
2025-12-04T09:03:29.0018093Z  Package                       Arch   Version    Repository                Size
2025-12-04T09:03:29.0018464Z ================================================================================
2025-12-04T09:03:29.0018755Z Downgrading:
2025-12-04T09:03:29.0019116Z  libnvidia-container-tools     x86_64 1.17.8-1   nvidia-container-toolkit  40 k
2025-12-04T09:03:29.0019713Z  libnvidia-container1          x86_64 1.17.8-1   nvidia-container-toolkit 1.0 M
2025-12-04T09:03:29.0020244Z  nvidia-container-toolkit      x86_64 1.17.8-1   nvidia-container-toolkit 1.2 M
2025-12-04T09:03:29.0020787Z  nvidia-container-toolkit-base x86_64 1.17.8-1   nvidia-container-toolkit 5.8 M
2025-12-04T09:03:29.0021118Z 
2025-12-04T09:03:29.0021209Z Transaction Summary
2025-12-04T09:03:29.0021447Z ================================================================================
2025-12-04T09:03:29.0021732Z Downgrade  4 Packages
2025-12-04T09:03:29.0021871Z 
2025-12-04T09:03:29.0021972Z Total download size: 8.0 M
2025-12-04T09:03:29.0023517Z Downloading Packages:
2025-12-04T09:03:29.1078887Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 387 kB/s |  40 kB     00:00    
2025-12-04T09:03:29.1555400Z (2/4): nvidia-container-toolkit-1.17.8-1.x86_64 8.2 MB/s | 1.2 MB     00:00    
2025-12-04T09:03:29.2010690Z (3/4): libnvidia-container1-1.17.8-1.x86_64.rpm 5.0 MB/s | 1.0 MB     00:00    
2025-12-04T09:03:29.2728036Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x  35 MB/s | 5.8 MB     00:00    
2025-12-04T09:03:29.2736375Z --------------------------------------------------------------------------------
2025-12-04T09:03:29.2739078Z Total                                            30 MB/s | 8.0 MB     00:00     
2025-12-04T09:03:29.2741659Z Running transaction check
2025-12-04T09:03:29.2879417Z Transaction check succeeded.
2025-12-04T09:03:29.2880313Z Running transaction test
2025-12-04T09:03:29.3309170Z Transaction test succeeded.
2025-12-04T09:03:29.3312202Z Running transaction
2025-12-04T09:03:29.9209024Z   Preparing        :                                                        1/1 
2025-12-04T09:03:30.0557383Z   Downgrading      : nvidia-container-toolkit-base-1.17.8-1.x86_64          1/8 
2025-12-04T09:03:30.0809955Z   Downgrading      : libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:03:30.1328577Z   Running scriptlet: libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:03:30.2273127Z   Downgrading      : libnvidia-container-tools-1.17.8-1.x86_64              3/8 
2025-12-04T09:03:30.2684423Z   Downgrading      : nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:03:30.3469468Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:03:30.3526455Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:03:30.3527655Z   Cleanup          : nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:03:30.3861542Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:03:30.3912557Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:03:30.3913931Z   Cleanup          : libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:03:30.4230309Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:03:30.4293466Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:03:30.4294470Z   Cleanup          : libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:03:30.4613118Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:03:30.4667645Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:03:30.4668677Z   Cleanup          : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:03:30.4927205Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:03:30.5359189Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               8/8 
2025-12-04T09:04:16.7571633Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:04:16.7574101Z   Verifying        : libnvidia-container-tools-1.17.8-1.x86_64              1/8 
2025-12-04T09:04:16.7574752Z   Verifying        : libnvidia-container-tools-1.18.1-1.x86_64              2/8 
2025-12-04T09:04:16.7575278Z   Verifying        : libnvidia-container1-1.17.8-1.x86_64                   3/8 
2025-12-04T09:04:16.7575779Z   Verifying        : libnvidia-container1-1.18.1-1.x86_64                   4/8 
2025-12-04T09:04:16.7576289Z   Verifying        : nvidia-container-toolkit-1.17.8-1.x86_64               5/8 
2025-12-04T09:04:16.7576703Z   Verifying        : nvidia-container-toolkit-1.18.1-1.x86_64               6/8 
2025-12-04T09:04:16.7577090Z   Verifying        : nvidia-container-toolkit-base-1.17.8-1.x86_64          7/8 
2025-12-04T09:04:16.8958425Z   Verifying        : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8================================================================================
2025-12-04T09:04:16.8958974Z WARNING:
2025-12-04T09:04:16.8959212Z   A newer release of "Amazon Linux" is available.
2025-12-04T09:04:16.8959435Z 
2025-12-04T09:04:16.8959525Z   Available Versions:
2025-12-04T09:04:16.8959667Z 
2025-12-04T09:04:16.8959767Z   Version 2023.9.20250929:
2025-12-04T09:04:16.8960169Z     Run the following command to upgrade to 2023.9.20250929:
2025-12-04T09:04:16.8960415Z 
2025-12-04T09:04:16.8960550Z       dnf upgrade --releasever=2023.9.20250929
2025-12-04T09:04:16.8960752Z 
2025-12-04T09:04:16.8960842Z     Release notes:
2025-12-04T09:04:16.8961262Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html
2025-12-04T09:04:16.8961628Z 
2025-12-04T09:04:16.8961711Z   Version 2023.9.20251014:
2025-12-04T09:04:16.8962001Z     Run the following command to upgrade to 2023.9.20251014:
2025-12-04T09:04:16.8962232Z 
2025-12-04T09:04:16.8962340Z       dnf upgrade --releasever=2023.9.20251014
2025-12-04T09:04:16.8962542Z 
2025-12-04T09:04:16.8962618Z     Release notes:
2025-12-04T09:04:16.8962979Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html
2025-12-04T09:04:16.8963328Z 
2025-12-04T09:04:16.8963418Z   Version 2023.9.20251020:
2025-12-04T09:04:16.8963965Z     Run the following command to upgrade to 2023.9.20251020:
2025-12-04T09:04:16.8964221Z 
2025-12-04T09:04:16.8964335Z       dnf upgrade --releasever=2023.9.20251020
2025-12-04T09:04:16.8964556Z 
2025-12-04T09:04:16.8964645Z     Release notes:
2025-12-04T09:04:16.8965032Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html
2025-12-04T09:04:16.8965372Z 
2025-12-04T09:04:16.8965453Z   Version 2023.9.20251027:
2025-12-04T09:04:16.8965742Z     Run the following command to upgrade to 2023.9.20251027:
2025-12-04T09:04:16.8965970Z 
2025-12-04T09:04:16.8966081Z       dnf upgrade --releasever=2023.9.20251027
2025-12-04T09:04:16.8966280Z 
2025-12-04T09:04:16.8966346Z     Release notes:
2025-12-04T09:04:16.8966643Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html
2025-12-04T09:04:16.8966923Z 
2025-12-04T09:04:16.8966989Z   Version 2023.9.20251105:
2025-12-04T09:04:16.8967220Z     Run the following command to upgrade to 2023.9.20251105:
2025-12-04T09:04:16.8967407Z 
2025-12-04T09:04:16.8967499Z       dnf upgrade --releasever=2023.9.20251105
2025-12-04T09:04:16.8967661Z 
2025-12-04T09:04:16.8967725Z     Release notes:
2025-12-04T09:04:16.8968194Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html
2025-12-04T09:04:16.8968463Z 
2025-12-04T09:04:16.8968535Z   Version 2023.9.20251110:
2025-12-04T09:04:16.8968749Z     Run the following command to upgrade to 2023.9.20251110:
2025-12-04T09:04:16.8968938Z 
2025-12-04T09:04:16.8969022Z       dnf upgrade --releasever=2023.9.20251110
2025-12-04T09:04:16.8969169Z 
2025-12-04T09:04:16.8969239Z     Release notes:
2025-12-04T09:04:16.8969520Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html
2025-12-04T09:04:16.8969801Z 
2025-12-04T09:04:16.8969865Z   Version 2023.9.20251117:
2025-12-04T09:04:16.8970085Z     Run the following command to upgrade to 2023.9.20251117:
2025-12-04T09:04:16.8970263Z 
2025-12-04T09:04:16.8970351Z       dnf upgrade --releasever=2023.9.20251117
2025-12-04T09:04:16.8970505Z 
2025-12-04T09:04:16.8970565Z     Release notes:
2025-12-04T09:04:16.8970852Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html
2025-12-04T09:04:16.8971124Z 
2025-12-04T09:04:16.8971218Z ================================================================================
2025-12-04T09:04:16.9422958Z  
2025-12-04T09:04:16.9423081Z 
2025-12-04T09:04:16.9423160Z Downgraded:
2025-12-04T09:04:16.9423524Z   libnvidia-container-tools-1.17.8-1.x86_64                                     
2025-12-04T09:04:16.9424050Z   libnvidia-container1-1.17.8-1.x86_64                                          
2025-12-04T09:04:16.9424545Z   nvidia-container-toolkit-1.17.8-1.x86_64                                      
2025-12-04T09:04:16.9425070Z   nvidia-container-toolkit-base-1.17.8-1.x86_64                                 
2025-12-04T09:04:16.9425393Z 
2025-12-04T09:04:16.9425484Z Complete!
2025-12-04T09:04:17.0135681Z + sudo systemctl restart docker
2025-12-04T09:04:22.3383785Z Thu Dec  4 09:04:22 2025       
2025-12-04T09:04:22.3384240Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:22.3384765Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:04:22.3385218Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:22.3385682Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:04:22.3386208Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:04:22.3386597Z |                                         |                        |               MIG M. |
2025-12-04T09:04:22.3386889Z |=========================================+========================+======================|
2025-12-04T09:04:22.3455623Z |   0  NVIDIA L4                      On  |   00000000:35:00.0 Off |                    0 |
2025-12-04T09:04:22.3456406Z | N/A   35C    P0             30W /   72W |       0MiB /  23034MiB |      4%      Default |
2025-12-04T09:04:22.3456795Z |                                         |                        |                  N/A |
2025-12-04T09:04:22.3457171Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:22.3457440Z 
2025-12-04T09:04:22.3457598Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:22.3457987Z | Processes:                                                                              |
2025-12-04T09:04:22.3458407Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:04:22.3458787Z |        ID   ID                                                               Usage      |
2025-12-04T09:04:22.3459086Z |=========================================================================================|
2025-12-04T09:04:22.3460286Z |  No running processes found                                                             |
2025-12-04T09:04:22.3460718Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:22.5054171Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally
2025-12-04T09:04:22.6988049Z 3.13: Pulling from docker/library/python
2025-12-04T09:04:22.7759308Z 53c88f1dfeb7: Pulling fs layer
2025-12-04T09:04:22.7759570Z eae668646f44: Pulling fs layer
2025-12-04T09:04:22.7760116Z ff2e6e687b6c: Pulling fs layer
2025-12-04T09:04:22.7760330Z 7c40a3faff76: Pulling fs layer
2025-12-04T09:04:22.7760530Z 967a3b1c8fef: Pulling fs layer
2025-12-04T09:04:22.7760712Z a64e1a44f22a: Pulling fs layer
2025-12-04T09:04:22.7760913Z 52655f8a5bcc: Pulling fs layer
2025-12-04T09:04:22.7761098Z 52655f8a5bcc: Waiting
2025-12-04T09:04:22.7762497Z a64e1a44f22a: Waiting
2025-12-04T09:04:22.7762775Z 967a3b1c8fef: Waiting
2025-12-04T09:04:22.7763067Z 7c40a3faff76: Waiting
2025-12-04T09:04:22.9102802Z eae668646f44: Verifying Checksum
2025-12-04T09:04:22.9103184Z eae668646f44: Download complete
2025-12-04T09:04:22.9632502Z 53c88f1dfeb7: Verifying Checksum
2025-12-04T09:04:22.9633998Z 53c88f1dfeb7: Download complete
2025-12-04T09:04:22.9747297Z ff2e6e687b6c: Verifying Checksum
2025-12-04T09:04:22.9748579Z ff2e6e687b6c: Download complete
2025-12-04T09:04:23.0123901Z 967a3b1c8fef: Verifying Checksum
2025-12-04T09:04:23.0124385Z 967a3b1c8fef: Download complete
2025-12-04T09:04:23.0625284Z 52655f8a5bcc: Verifying Checksum
2025-12-04T09:04:23.0625589Z 52655f8a5bcc: Download complete
2025-12-04T09:04:23.0898157Z a64e1a44f22a: Verifying Checksum
2025-12-04T09:04:23.0898521Z a64e1a44f22a: Download complete
2025-12-04T09:04:23.5920086Z 7c40a3faff76: Verifying Checksum
2025-12-04T09:04:23.5920458Z 7c40a3faff76: Download complete
2025-12-04T09:04:24.2653767Z 53c88f1dfeb7: Pull complete
2025-12-04T09:04:24.7886603Z eae668646f44: Pull complete
2025-12-04T09:04:26.5639773Z ff2e6e687b6c: Pull complete
2025-12-04T09:04:31.2934208Z 7c40a3faff76: Pull complete
2025-12-04T09:04:31.5408518Z 967a3b1c8fef: Pull complete
2025-12-04T09:04:32.2780175Z a64e1a44f22a: Pull complete
2025-12-04T09:04:32.4474606Z 52655f8a5bcc: Pull complete
2025-12-04T09:04:32.5077255Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T09:04:32.5308537Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13
2025-12-04T09:04:40.2205463Z Thu Dec  4 09:04:40 2025       
2025-12-04T09:04:40.2205908Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:40.2206414Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:04:40.2206894Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:40.2207376Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:04:40.2208235Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:04:40.2208645Z |                                         |                        |               MIG M. |
2025-12-04T09:04:40.2208982Z |=========================================+========================+======================|
2025-12-04T09:04:40.2313559Z |   0  NVIDIA L4                      On  |   00000000:35:00.0 Off |                    0 |
2025-12-04T09:04:40.2314049Z | N/A   34C    P8             12W /   72W |       0MiB /  23034MiB |      0%      Default |
2025-12-04T09:04:40.2314415Z |                                         |                        |                  N/A |
2025-12-04T09:04:40.2314785Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:04:40.2316249Z 
2025-12-04T09:04:40.2316510Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:40.2316971Z | Processes:                                                                              |
2025-12-04T09:04:40.2317709Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:04:40.2318340Z |        ID   ID                                                               Usage      |
2025-12-04T09:04:40.2318663Z |=========================================================================================|
2025-12-04T09:04:40.2321470Z |  No running processes found                                                             |
2025-12-04T09:04:40.2321984Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:04:41.5998512Z Command completed after 1 attempt(s).
2025-12-04T09:04:41.6090319Z Prepare all required actions
2025-12-04T09:04:41.6114632Z ##[group]Run ./.github/actions/get-workflow-job-id
2025-12-04T09:04:41.6114887Z with:
2025-12-04T09:04:41.6115429Z   github-token: ***
2025-12-04T09:04:41.6115609Z env:
2025-12-04T09:04:41.6115768Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:04:41.6116016Z   HAS_NVIDIA_GPU: true
2025-12-04T09:04:41.6116410Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:04:41.6116735Z ##[endgroup]
2025-12-04T09:04:41.6130591Z ##[group]Run set -eux
2025-12-04T09:04:41.6130810Z [36;1mset -eux[0m
2025-12-04T09:04:41.6131147Z [36;1mpython3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"[0m
2025-12-04T09:04:41.6143784Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:04:41.6144085Z env:
2025-12-04T09:04:41.6144268Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:04:41.6144471Z   HAS_NVIDIA_GPU: true
2025-12-04T09:04:41.6144752Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:04:41.6145157Z   GITHUB_TOKEN: ***
2025-12-04T09:04:41.6145338Z ##[endgroup]
2025-12-04T09:04:41.6175584Z + python3 .github/scripts/get_workflow_job_id.py 19922768520 i-0e5520d20214059b0
2025-12-04T09:04:42.8178737Z Setting output job-id=57116084862
2025-12-04T09:04:42.8179817Z Setting output job-name=linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:04:42.8284545Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
2025-12-04T09:04:42.8285138Z [36;1mpython3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84[0m
2025-12-04T09:04:42.8285826Z [36;1mpython3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &[0m
2025-12-04T09:04:42.8286417Z [36;1mecho "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:04:42.8294150Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:04:42.8294431Z env:
2025-12-04T09:04:42.8294589Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:04:42.8294777Z   HAS_NVIDIA_GPU: true
2025-12-04T09:04:42.8295015Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:04:42.8295281Z   JOB_ID: 57116084862
2025-12-04T09:04:42.8295749Z   JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:04:42.8296215Z   WORKFLOW_NAME: trunk
2025-12-04T09:04:42.8296391Z   WORKFLOW_RUN_ID: 19922768520
2025-12-04T09:04:42.8296607Z   MONITOR_LOG_INTERVAL: 5
2025-12-04T09:04:42.8296805Z   MONITOR_DATA_COLLECT_INTERVAL: 1
2025-12-04T09:04:42.8297007Z ##[endgroup]
2025-12-04T09:04:43.0923583Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:04:43.4218282Z Collecting psutil==5.9.8
2025-12-04T09:04:43.4367531Z   Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
2025-12-04T09:04:43.5044719Z Collecting dataclasses_json==0.6.7
2025-12-04T09:04:43.5077763Z   Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
2025-12-04T09:04:43.5328232Z Collecting nvidia-ml-py==11.525.84
2025-12-04T09:04:43.5362726Z   Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB)
2025-12-04T09:04:43.5664771Z Collecting typing-inspect<1,>=0.4.0
2025-12-04T09:04:43.5691935Z   Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
2025-12-04T09:04:43.6637987Z Collecting marshmallow<4.0.0,>=3.18.0
2025-12-04T09:04:43.6663897Z   Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
2025-12-04T09:04:43.7166555Z Collecting packaging>=17.0
2025-12-04T09:04:43.7198598Z   Downloading packaging-25.0-py3-none-any.whl (66 kB)
2025-12-04T09:04:43.7416700Z Collecting mypy-extensions>=0.3.0
2025-12-04T09:04:43.7450348Z   Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)
2025-12-04T09:04:43.7907657Z Collecting typing-extensions>=3.7.4
2025-12-04T09:04:43.7936738Z   Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
2025-12-04T09:04:43.8793251Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json
2025-12-04T09:04:44.1265615Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0
2025-12-04T09:04:44.2774664Z Prepare all required actions
2025-12-04T09:04:44.2775011Z Getting action download info
2025-12-04T09:04:44.4468168Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:04:44.6887547Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093)
2025-12-04T09:04:45.1401996Z ##[group]Run ./.github/actions/download-build-artifacts
2025-12-04T09:04:45.1402290Z with:
2025-12-04T09:04:45.1402492Z   name: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:04:45.1402741Z   s3-bucket: gha-artifacts
2025-12-04T09:04:45.1402955Z env:
2025-12-04T09:04:45.1403115Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:04:45.1403331Z   HAS_NVIDIA_GPU: true
2025-12-04T09:04:45.1403576Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:04:45.1403858Z ##[endgroup]
2025-12-04T09:04:45.1430312Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:04:45.1430599Z with:
2025-12-04T09:04:45.1430832Z   name: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:04:45.1431090Z   s3-bucket: gha-artifacts
2025-12-04T09:04:45.1431307Z   region: us-east-1
2025-12-04T09:04:45.1431483Z env:
2025-12-04T09:04:45.1431644Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:04:45.1431851Z   HAS_NVIDIA_GPU: true
2025-12-04T09:04:45.1432106Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:04:45.1432379Z ##[endgroup]
2025-12-04T09:04:45.5936596Z (node:58808) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:04:45.5937051Z 
2025-12-04T09:04:45.5937234Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:04:45.5937749Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:04:45.5938286Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:04:45.8667600Z Found 1 objects with prefix pytorch/pytorch/19922768520/linux-jammy-cuda12.8-py3.10-gcc11/
2025-12-04T09:04:45.8668338Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:04:54.1384307Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:04:54.1389486Z Artifact download has finished successfully
2025-12-04T09:04:54.1661365Z ##[group]Run unzip -o artifacts.zip
2025-12-04T09:04:54.1661631Z [36;1munzip -o artifacts.zip[0m
2025-12-04T09:04:54.1669279Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:04:54.1669590Z env:
2025-12-04T09:04:54.1669764Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:04:54.1669972Z   HAS_NVIDIA_GPU: true
2025-12-04T09:04:54.1670220Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:04:54.1670491Z ##[endgroup]
2025-12-04T09:04:54.1751871Z Archive:  artifacts.zip
2025-12-04T09:04:54.1753017Z    creating: dist/
2025-12-04T09:04:54.1869622Z   inflating: dist/.ninja_log         
2025-12-04T09:04:56.1721612Z   inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl  
2025-12-04T09:04:56.1722080Z    creating: build/
2025-12-04T09:04:56.1722338Z    creating: build/custom_test_artifacts/
2025-12-04T09:04:56.1722734Z    creating: build/custom_test_artifacts/custom-op-build/
2025-12-04T09:04:56.1723169Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/
2025-12-04T09:04:56.1723689Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/
2025-12-04T09:04:56.1730385Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:04:56.1730874Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/
2025-12-04T09:04:56.1731734Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:04:56.1732439Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:04:56.1733425Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:04:56.1734801Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:04:56.1736493Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:04:56.1737296Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:04:56.1737827Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:04:56.1738323Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:04:56.1740664Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:04:56.1742275Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:04:56.1743238Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:04:56.1745013Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:04:56.1746993Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:04:56.1747592Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:04:56.1748096Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:04:56.1795350Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:04:56.1843985Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:04:56.1844930Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:04:56.1895881Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:04:56.1896852Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:04:56.1897851Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:04:56.1898779Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:04:56.1899692Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:04:56.1900572Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:04:56.1901449Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:04:56.1902557Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:04:56.1903636Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:04:56.1904448Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:04:56.1905241Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:04:56.1906027Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:04:56.1907027Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:04:56.1908048Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:04:56.1910371Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:04:56.1972889Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:04:56.1973794Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:04:56.2036445Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:04:56.2037215Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/
2025-12-04T09:04:56.2037775Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/
2025-12-04T09:04:56.2038372Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:04:56.2038975Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/
2025-12-04T09:04:56.2039637Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts  
2025-12-04T09:04:56.2040461Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make  
2025-12-04T09:04:56.2041177Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make  
2025-12-04T09:04:56.2041834Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt  
2025-12-04T09:04:56.2042506Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:04:56.2043211Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make  
2025-12-04T09:04:56.2043905Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake  
2025-12-04T09:04:56.2045219Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make  
2025-12-04T09:04:56.2046156Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make  
2025-12-04T09:04:56.2064135Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d  
2025-12-04T09:04:56.2226122Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o  
2025-12-04T09:04:56.2226795Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/
2025-12-04T09:04:56.2227511Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts  
2025-12-04T09:04:56.2228291Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make  
2025-12-04T09:04:56.2229041Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make  
2025-12-04T09:04:56.2229748Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt  
2025-12-04T09:04:56.2230686Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:04:56.2231518Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make  
2025-12-04T09:04:56.2232294Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake  
2025-12-04T09:04:56.2233011Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make  
2025-12-04T09:04:56.2233747Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make  
2025-12-04T09:04:56.2252001Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d  
2025-12-04T09:04:56.2318147Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o  
2025-12-04T09:04:56.2319246Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:04:56.2320110Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:04:56.2320737Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks  
2025-12-04T09:04:56.2321332Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2  
2025-12-04T09:04:56.2322680Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:04:56.2323405Z   inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc  
2025-12-04T09:04:56.2325864Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt  
2025-12-04T09:04:56.2326687Z   inflating: build/custom_test_artifacts/custom-op-build/Makefile  
2025-12-04T09:04:56.2327458Z   inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake  
2025-12-04T09:04:56.2463884Z   inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so  
2025-12-04T09:04:56.2507860Z   inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops  
2025-12-04T09:04:56.2508364Z    creating: build/custom_test_artifacts/jit-hook-build/
2025-12-04T09:04:56.2508795Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/
2025-12-04T09:04:56.2509321Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/
2025-12-04T09:04:56.2515740Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:04:56.2516329Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/
2025-12-04T09:04:56.2516894Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:04:56.2517718Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:04:56.2518314Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:04:56.2520399Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:04:56.2521810Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:04:56.2522714Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:04:56.2523346Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:04:56.2523956Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:04:56.2526082Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:04:56.2527423Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:04:56.2528513Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:04:56.2530350Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:04:56.2532089Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:04:56.2532651Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:04:56.2533154Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:04:56.2580493Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:04:56.2629087Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:04:56.2630059Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:04:56.2681102Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:04:56.2682055Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:04:56.2683129Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:04:56.2684142Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:04:56.2685080Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:04:56.2685971Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:04:56.2686860Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:04:56.2687609Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:04:56.2688531Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:04:56.2689370Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:04:56.2690095Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:04:56.2690948Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:04:56.2691920Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:04:56.2692903Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:04:56.2695792Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:04:56.2758610Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:04:56.2759517Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:04:56.2822190Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:04:56.2822923Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/
2025-12-04T09:04:56.2823455Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/
2025-12-04T09:04:56.2824013Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:04:56.2824599Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/
2025-12-04T09:04:56.2825272Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts  
2025-12-04T09:04:56.2826238Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make  
2025-12-04T09:04:56.2826965Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make  
2025-12-04T09:04:56.2827520Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt  
2025-12-04T09:04:56.2828069Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake  
2025-12-04T09:04:56.2828762Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make  
2025-12-04T09:04:56.2829586Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake  
2025-12-04T09:04:56.2830312Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make  
2025-12-04T09:04:56.2831405Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make  
2025-12-04T09:04:56.2849574Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d  
2025-12-04T09:04:56.2900871Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o  
2025-12-04T09:04:56.2901769Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:04:56.2902486Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:04:56.2903101Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks  
2025-12-04T09:04:56.2904059Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2  
2025-12-04T09:04:56.2905800Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:04:56.2906372Z   inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc  
2025-12-04T09:04:56.2908664Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt  
2025-12-04T09:04:56.2909619Z   inflating: build/custom_test_artifacts/jit-hook-build/Makefile  
2025-12-04T09:04:56.2910434Z   inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake  
2025-12-04T09:04:56.2941608Z   inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks  
2025-12-04T09:04:56.2942090Z    creating: build/custom_test_artifacts/custom-backend-build/
2025-12-04T09:04:56.2942564Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/
2025-12-04T09:04:56.2943136Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/
2025-12-04T09:04:56.2949577Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:04:56.2950240Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/
2025-12-04T09:04:56.2950883Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:04:56.2951561Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:04:56.2952212Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:04:56.2953662Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:04:56.2955236Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:04:56.2956115Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:04:56.2956770Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:04:56.2957303Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:04:56.2959450Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:04:56.2961194Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:04:56.2962019Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:04:56.2963879Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:04:56.2965696Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:04:56.2966335Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:04:56.2966876Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:04:56.3014309Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:04:56.3062591Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:04:56.3063611Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:04:56.3114244Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:04:56.3115204Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:04:56.3116181Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:04:56.3117353Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:04:56.3118320Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:04:56.3119260Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:04:56.3120290Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:04:56.3121233Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:04:56.3122309Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:04:56.3123172Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:04:56.3124008Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:04:56.3124834Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:04:56.3125702Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:04:56.3126602Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:04:56.3128935Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:04:56.3191117Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:04:56.3191919Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:04:56.3254874Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:04:56.3255816Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/
2025-12-04T09:04:56.3256392Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/
2025-12-04T09:04:56.3257004Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:04:56.3257637Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/
2025-12-04T09:04:56.3258344Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts  
2025-12-04T09:04:56.3259147Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make  
2025-12-04T09:04:56.3259928Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make  
2025-12-04T09:04:56.3260783Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt  
2025-12-04T09:04:56.3261534Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:04:56.3262287Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make  
2025-12-04T09:04:56.3263038Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake  
2025-12-04T09:04:56.3263785Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make  
2025-12-04T09:04:56.3264672Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make  
2025-12-04T09:04:56.3268527Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d  
2025-12-04T09:04:56.3365614Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o  
2025-12-04T09:04:56.3366411Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/
2025-12-04T09:04:56.3367196Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts  
2025-12-04T09:04:56.3368069Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make  
2025-12-04T09:04:56.3368896Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make  
2025-12-04T09:04:56.3369659Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt  
2025-12-04T09:04:56.3370432Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:04:56.3371225Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make  
2025-12-04T09:04:56.3372046Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake  
2025-12-04T09:04:56.3372832Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make  
2025-12-04T09:04:56.3373603Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make  
2025-12-04T09:04:56.3391177Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d  
2025-12-04T09:04:56.3436308Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o  
2025-12-04T09:04:56.3437158Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:04:56.3437904Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:04:56.3438580Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks  
2025-12-04T09:04:56.3439536Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2  
2025-12-04T09:04:56.3441142Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:04:56.3441756Z   inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc  
2025-12-04T09:04:56.3444354Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt  
2025-12-04T09:04:56.3445353Z   inflating: build/custom_test_artifacts/custom-backend-build/Makefile  
2025-12-04T09:04:56.3448303Z   inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake  
2025-12-04T09:04:56.3529291Z   inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so  
2025-12-04T09:04:56.3560541Z   inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend  
2025-12-04T09:04:56.3560975Z    creating: build/lib/
2025-12-04T09:04:56.3628172Z   inflating: build/lib/libprotobuf-lite.a  
2025-12-04T09:04:56.3984951Z   inflating: build/lib/libprotobuf.a  
2025-12-04T09:04:56.4385449Z   inflating: build/lib/libprotoc.a   
2025-12-04T09:04:56.4393486Z   inflating: build/lib/libpthreadpool.a  
2025-12-04T09:04:56.4400512Z   inflating: build/lib/libcpuinfo.a  
2025-12-04T09:04:56.4407038Z   inflating: build/lib/libcpuinfo_internals.a  
2025-12-04T09:04:56.4407915Z   inflating: build/lib/libclog.a     
2025-12-04T09:04:56.4423688Z   inflating: build/lib/libpytorch_qnnpack.a  
2025-12-04T09:04:56.4425841Z   inflating: build/lib/libnnpack_reference_layers.a  
2025-12-04T09:04:56.4440663Z   inflating: build/lib/libnnpack.a   
2025-12-04T09:04:56.4588354Z   inflating: build/lib/libmicrokernels-prod.a  
2025-12-04T09:04:56.5271613Z   inflating: build/lib/libmicrokernels-all.a  
2025-12-04T09:04:56.5328038Z   inflating: build/lib/libgtest.a    
2025-12-04T09:04:56.5342526Z   inflating: build/lib/libgmock.a    
2025-12-04T09:04:56.5343293Z   inflating: build/lib/libgtest_main.a  
2025-12-04T09:04:56.5344163Z   inflating: build/lib/libgmock_main.a  
2025-12-04T09:04:56.5416580Z   inflating: build/lib/libXNNPACK.a  
2025-12-04T09:04:56.5477627Z   inflating: build/lib/libbenchmark.a  
2025-12-04T09:04:56.5478346Z   inflating: build/lib/libbenchmark_main.a  
2025-12-04T09:04:56.5479349Z   inflating: build/lib/libjitprofiling.a  
2025-12-04T09:04:56.5486118Z   inflating: build/lib/libittnotify.a  
2025-12-04T09:04:56.5539011Z   inflating: build/lib/libasmjit.a   
2025-12-04T09:04:56.6473495Z   inflating: build/lib/libfbgemm.a   
2025-12-04T09:04:56.6497965Z   inflating: build/lib/libtensorpipe_uv.a  
2025-12-04T09:04:56.6947326Z   inflating: build/lib/libtensorpipe.a  
2025-12-04T09:04:56.7156355Z   inflating: build/lib/libtensorpipe_cuda.a  
2025-12-04T09:04:56.7270554Z   inflating: build/lib/libgloo.a     
2025-12-04T09:04:56.7312248Z   inflating: build/lib/libonnx_proto.a  
2025-12-04T09:04:56.7680020Z   inflating: build/lib/libgloo_cuda.a  
2025-12-04T09:04:56.8264790Z   inflating: build/lib/libonnx.a     
2025-12-04T09:04:56.8280793Z   inflating: build/lib/libfmt.a      
2025-12-04T09:04:57.6493155Z   inflating: build/lib/libdnnl.a     
2025-12-04T09:04:57.6873181Z   inflating: build/lib/libkineto.a   
2025-12-04T09:04:57.6966029Z   inflating: build/lib/libc10.so     
2025-12-04T09:04:57.7005924Z   inflating: build/lib/libc10_cuda.so  
2025-12-04T09:04:57.7007954Z   inflating: build/lib/libcaffe2_nvrtc.so  
2025-12-04T09:04:57.7009563Z   inflating: build/lib/libtorch_global_deps.so  
2025-12-04T09:05:00.1801117Z   inflating: build/lib/libtorch_cpu.so  
2025-12-04T09:05:00.2432060Z   inflating: build/lib/libtorch_nvshmem.so  
2025-12-04T09:05:02.6485225Z   inflating: build/lib/libtorch_cuda.so  
2025-12-04T09:05:02.6486587Z   inflating: build/lib/libtorch.so   
2025-12-04T09:05:02.6528305Z   inflating: build/lib/libtorch_cuda_linalg.so  
2025-12-04T09:05:02.6585943Z   inflating: build/lib/libtorchbind_test.so  
2025-12-04T09:05:02.6601874Z   inflating: build/lib/libjitbackend_test.so  
2025-12-04T09:05:02.6621761Z   inflating: build/lib/libbackend_with_compiler.so  
2025-12-04T09:05:02.6643668Z   inflating: build/lib/libaoti_custom_ops.so  
2025-12-04T09:05:02.6646092Z   inflating: build/lib/libc10d_cuda_test.so  
2025-12-04T09:05:02.6650003Z   inflating: build/lib/libshm.so     
2025-12-04T09:05:02.8546288Z   inflating: build/lib/libtorch_python.so  
2025-12-04T09:05:02.8575413Z   inflating: build/lib/libnnapi_backend.so  
2025-12-04T09:05:02.8575732Z    creating: build/bin/
2025-12-04T09:05:02.8939643Z   inflating: build/bin/protoc-3.13.0.0  
2025-12-04T09:05:02.9303835Z   inflating: build/bin/protoc        
2025-12-04T09:05:02.9351312Z   inflating: build/bin/c10_AllocatorConfig_test  
2025-12-04T09:05:02.9395516Z   inflating: build/bin/c10_CompileTimeFunctionPointer_test  
2025-12-04T09:05:02.9441558Z   inflating: build/bin/c10_DeviceGuard_test  
2025-12-04T09:05:02.9487390Z   inflating: build/bin/c10_Device_test  
2025-12-04T09:05:02.9540878Z   inflating: build/bin/c10_DispatchKeySet_test  
2025-12-04T09:05:02.9588841Z   inflating: build/bin/c10_Scalar_test  
2025-12-04T09:05:02.9632981Z   inflating: build/bin/c10_StreamGuard_test  
2025-12-04T09:05:02.9683825Z   inflating: build/bin/c10_SymInt_test  
2025-12-04T09:05:02.9732694Z   inflating: build/bin/c10_InlineDeviceGuard_test  
2025-12-04T09:05:02.9782393Z   inflating: build/bin/c10_InlineStreamGuard_test  
2025-12-04T09:05:02.9826958Z   inflating: build/bin/c10_ConstexprCrc_test  
2025-12-04T09:05:02.9876461Z   inflating: build/bin/c10_SizesAndStrides_test  
2025-12-04T09:05:02.9938306Z   inflating: build/bin/c10_cow_test  
2025-12-04T09:05:02.9985419Z   inflating: build/bin/c10_Bitset_test  
2025-12-04T09:05:03.0030261Z   inflating: build/bin/c10_ArrayRef_test  
2025-12-04T09:05:03.0074341Z   inflating: build/bin/c10_DeadlockDetection_test  
2025-12-04T09:05:03.0121340Z   inflating: build/bin/c10_IntrusiveList_test  
2025-12-04T09:05:03.0170978Z   inflating: build/bin/c10_LeftRight_test  
2025-12-04T09:05:03.0216720Z   inflating: build/bin/c10_Half_test  
2025-12-04T09:05:03.0260840Z   inflating: build/bin/c10_Semaphore_test  
2025-12-04T09:05:03.0311261Z   inflating: build/bin/c10_Enumerate_test  
2025-12-04T09:05:03.0358916Z   inflating: build/bin/c10_NetworkFlow_test  
2025-12-04T09:05:03.0403616Z   inflating: build/bin/c10_Synchronized_test  
2025-12-04T09:05:03.0453586Z   inflating: build/bin/c10_ThreadLocal_test  
2025-12-04T09:05:03.0499538Z   inflating: build/bin/c10_accumulate_test  
2025-12-04T09:05:03.0545945Z   inflating: build/bin/c10_TypeIndex_test  
2025-12-04T09:05:03.0590889Z   inflating: build/bin/c10_bit_cast_test  
2025-12-04T09:05:03.0642187Z   inflating: build/bin/c10_bfloat16_test  
2025-12-04T09:05:03.0692875Z   inflating: build/bin/c10_complex_math_test  
2025-12-04T09:05:03.0739794Z   inflating: build/bin/c10_exception_test  
2025-12-04T09:05:03.0784000Z   inflating: build/bin/c10_error_test  
2025-12-04T09:05:03.0833158Z   inflating: build/bin/c10_complex_test  
2025-12-04T09:05:03.0878016Z   inflating: build/bin/c10_flags_test  
2025-12-04T09:05:03.0923730Z   inflating: build/bin/c10_generic_math_test  
2025-12-04T09:05:03.1056705Z   inflating: build/bin/c10_intrusive_ptr_test  
2025-12-04T09:05:03.1102478Z   inflating: build/bin/c10_irange_test  
2025-12-04T09:05:03.1150151Z   inflating: build/bin/c10_lazy_test  
2025-12-04T09:05:03.1194646Z   inflating: build/bin/c10_nofatal_test  
2025-12-04T09:05:03.1245247Z   inflating: build/bin/c10_logging_test  
2025-12-04T09:05:03.1310214Z   inflating: build/bin/c10_optional_test  
2025-12-04T09:05:03.1365046Z   inflating: build/bin/c10_ordered_preserving_dict_test  
2025-12-04T09:05:03.1494597Z   inflating: build/bin/c10_small_vector_test  
2025-12-04T09:05:03.1542020Z   inflating: build/bin/c10_registry_test  
2025-12-04T09:05:03.1592008Z   inflating: build/bin/c10_string_util_test  
2025-12-04T09:05:03.1638253Z   inflating: build/bin/c10_ssize_test  
2025-12-04T09:05:03.1682309Z   inflating: build/bin/c10_string_view_test  
2025-12-04T09:05:03.1721560Z   inflating: build/bin/c10_intrusive_ptr_benchmark  
2025-12-04T09:05:03.1765784Z   inflating: build/bin/c10_tempfile_test  
2025-12-04T09:05:03.1815172Z   inflating: build/bin/c10_typeid_test  
2025-12-04T09:05:03.1862286Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test  
2025-12-04T09:05:03.1909587Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream  
2025-12-04T09:05:03.1956811Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads  
2025-12-04T09:05:03.2000881Z   inflating: build/bin/c10_cuda_CUDATest  
2025-12-04T09:05:03.2048101Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device  
2025-12-04T09:05:03.2094611Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes  
2025-12-04T09:05:03.2141983Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks  
2025-12-04T09:05:03.2188759Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block  
2025-12-04T09:05:03.2671382Z   inflating: build/bin/vec_test_all_types_DEFAULT  
2025-12-04T09:05:03.3166525Z   inflating: build/bin/vec_test_all_types_AVX512  
2025-12-04T09:05:03.3671635Z   inflating: build/bin/vec_test_all_types_AVX2  
2025-12-04T09:05:03.3715600Z   inflating: build/bin/test_vec_half_DEFAULT  
2025-12-04T09:05:03.3799571Z   inflating: build/bin/test_aoti_abi_check  
2025-12-04T09:05:03.3844298Z   inflating: build/bin/test_vec_half_AVX512  
2025-12-04T09:05:03.3888618Z   inflating: build/bin/test_vec_half_AVX2  
2025-12-04T09:05:03.3952180Z   inflating: build/bin/Dict_test     
2025-12-04T09:05:03.3999068Z   inflating: build/bin/Dimname_test  
2025-12-04T09:05:03.4056254Z   inflating: build/bin/MaybeOwned_test  
2025-12-04T09:05:03.4106398Z   inflating: build/bin/NamedTensor_test  
2025-12-04T09:05:03.4158423Z   inflating: build/bin/apply_utils_test  
2025-12-04T09:05:03.4210835Z   inflating: build/bin/atest         
2025-12-04T09:05:03.4266760Z   inflating: build/bin/basic         
2025-12-04T09:05:03.4314740Z   inflating: build/bin/broadcast_test  
2025-12-04T09:05:03.4360061Z   inflating: build/bin/cpu_allocator_test  
2025-12-04T09:05:03.4411119Z   inflating: build/bin/cpu_generator_test  
2025-12-04T09:05:03.4458035Z   inflating: build/bin/cpu_profiling_allocator_test  
2025-12-04T09:05:03.4538445Z   inflating: build/bin/cpu_rng_test  
2025-12-04T09:05:03.4584762Z   inflating: build/bin/dlconvertor_test  
2025-12-04T09:05:03.4636050Z   inflating: build/bin/extension_backend_test  
2025-12-04T09:05:03.4684933Z   inflating: build/bin/half_test     
2025-12-04T09:05:03.4768462Z   inflating: build/bin/ivalue_test   
2025-12-04T09:05:03.4812734Z   inflating: build/bin/lazy_tensor_test  
2025-12-04T09:05:03.4859751Z   inflating: build/bin/math_kernel_test  
2025-12-04T09:05:03.4906455Z   inflating: build/bin/memory_format_test  
2025-12-04T09:05:03.4954098Z   inflating: build/bin/memory_overlapping_test  
2025-12-04T09:05:03.5001401Z   inflating: build/bin/mobile_memory_cleanup  
2025-12-04T09:05:03.5051112Z   inflating: build/bin/native_test   
2025-12-04T09:05:03.5096469Z   inflating: build/bin/operator_name_test  
2025-12-04T09:05:03.5141896Z   inflating: build/bin/operators_test  
2025-12-04T09:05:03.5188238Z   inflating: build/bin/packedtensoraccessor_test  
2025-12-04T09:05:03.5247477Z   inflating: build/bin/pow_test      
2025-12-04T09:05:03.5297377Z   inflating: build/bin/quantized_test  
2025-12-04T09:05:03.5342317Z   inflating: build/bin/reduce_ops_test  
2025-12-04T09:05:03.5387744Z   inflating: build/bin/reportMemoryUsage_test  
2025-12-04T09:05:03.5436968Z   inflating: build/bin/scalar_tensor_test  
2025-12-04T09:05:03.5487631Z   inflating: build/bin/scalar_test   
2025-12-04T09:05:03.5533894Z   inflating: build/bin/StorageUtils_test  
2025-12-04T09:05:03.5580124Z   inflating: build/bin/stride_properties_test  
2025-12-04T09:05:03.5649690Z   inflating: build/bin/tensor_iterator_test  
2025-12-04T09:05:03.5697479Z   inflating: build/bin/test_parallel  
2025-12-04T09:05:03.5742902Z   inflating: build/bin/thread_init_test  
2025-12-04T09:05:03.5793022Z   inflating: build/bin/type_ptr_test  
2025-12-04T09:05:03.5845862Z   inflating: build/bin/type_test     
2025-12-04T09:05:03.5892485Z   inflating: build/bin/undefined_tensor_test  
2025-12-04T09:05:03.5936539Z   inflating: build/bin/verify_api_visibility  
2025-12-04T09:05:03.5998337Z   inflating: build/bin/legacy_vmap_test  
2025-12-04T09:05:03.6044228Z   inflating: build/bin/weakref_test  
2025-12-04T09:05:03.6090083Z   inflating: build/bin/wrapdim_test  
2025-12-04T09:05:03.6135723Z   inflating: build/bin/xla_tensor_test  
2025-12-04T09:05:03.6187980Z   inflating: build/bin/IListRef_test  
2025-12-04T09:05:03.6278389Z   inflating: build/bin/List_test     
2025-12-04T09:05:03.6336623Z   inflating: build/bin/KernelFunction_test  
2025-12-04T09:05:03.6438355Z   inflating: build/bin/kernel_function_legacy_test  
2025-12-04T09:05:03.6520563Z   inflating: build/bin/kernel_function_test  
2025-12-04T09:05:03.6628476Z   inflating: build/bin/kernel_lambda_legacy_test  
2025-12-04T09:05:03.6715587Z   inflating: build/bin/kernel_lambda_test  
2025-12-04T09:05:03.6768554Z   inflating: build/bin/kernel_stackbased_test  
2025-12-04T09:05:03.6850612Z   inflating: build/bin/make_boxed_from_unboxed_functor_test  
2025-12-04T09:05:03.6896118Z   inflating: build/bin/CppSignature_test  
2025-12-04T09:05:03.6944695Z   inflating: build/bin/backend_fallback_test  
2025-12-04T09:05:03.6988200Z   inflating: build/bin/op_allowlist_test  
2025-12-04T09:05:03.7249350Z   inflating: build/bin/op_registration_test  
2025-12-04T09:05:03.7307679Z   inflating: build/bin/inline_container_test  
2025-12-04T09:05:03.7355204Z   inflating: build/bin/cuda_allocator_test  
2025-12-04T09:05:03.7401851Z   inflating: build/bin/cuda_apply_test  
2025-12-04T09:05:03.7455225Z   inflating: build/bin/cuda_atomic_ops_test  
2025-12-04T09:05:03.7505082Z   inflating: build/bin/cuda_caching_host_allocator_test  
2025-12-04T09:05:03.7566724Z   inflating: build/bin/cuda_complex_math_test  
2025-12-04T09:05:03.7619265Z   inflating: build/bin/cuda_complex_test  
2025-12-04T09:05:03.7676194Z   inflating: build/bin/cuda_cub_test  
2025-12-04T09:05:03.7723539Z   inflating: build/bin/cuda_cublas_handle_pool_test  
2025-12-04T09:05:03.7767678Z   inflating: build/bin/cuda_device_test  
2025-12-04T09:05:03.7833561Z   inflating: build/bin/cuda_distributions_test  
2025-12-04T09:05:03.7880371Z   inflating: build/bin/cuda_dlconvertor_test  
2025-12-04T09:05:03.7928395Z   inflating: build/bin/cuda_event_test  
2025-12-04T09:05:03.7971928Z   inflating: build/bin/cuda_exchange_device_test  
2025-12-04T09:05:03.8022558Z   inflating: build/bin/cuda_generator_test  
2025-12-04T09:05:03.8067248Z   inflating: build/bin/cuda_half_test  
2025-12-04T09:05:03.8111777Z   inflating: build/bin/cuda_allocatorTraceTracker_test  
2025-12-04T09:05:03.8165914Z   inflating: build/bin/cuda_stream_test  
2025-12-04T09:05:03.8212738Z   inflating: build/bin/cuda_reportMemoryUsage_test  
2025-12-04T09:05:03.8256980Z   inflating: build/bin/cuda_cudnn_test  
2025-12-04T09:05:03.8302881Z   inflating: build/bin/cuda_integer_divider_test  
2025-12-04T09:05:03.8347420Z   inflating: build/bin/cuda_optional_test  
2025-12-04T09:05:03.8393987Z   inflating: build/bin/cuda_packedtensoraccessor_test  
2025-12-04T09:05:03.8440786Z   inflating: build/bin/cuda_vectorized_test  
2025-12-04T09:05:03.9340075Z   inflating: build/bin/test_jit      
2025-12-04T09:05:03.9635018Z   inflating: build/bin/test_lazy     
2025-12-04T09:05:03.9682340Z   inflating: build/bin/BackoffTest   
2025-12-04T09:05:03.9729801Z   inflating: build/bin/FileStoreTest  
2025-12-04T09:05:03.9780051Z   inflating: build/bin/TCPStoreTest  
2025-12-04T09:05:03.9828665Z   inflating: build/bin/HashStoreTest  
2025-12-04T09:05:03.9840726Z   inflating: build/bin/ProcessGroupMPITest  
2025-12-04T09:05:03.9843446Z   inflating: build/bin/example_allreduce  
2025-12-04T09:05:03.9892252Z   inflating: build/bin/test_dist_autograd  
2025-12-04T09:05:03.9951998Z   inflating: build/bin/test_cpp_rpc  
2025-12-04T09:05:04.0011450Z   inflating: build/bin/ProcessGroupGlooTest  
2025-12-04T09:05:04.0061663Z   inflating: build/bin/ProcessGroupGlooAsyncTest  
2025-12-04T09:05:04.0118865Z   inflating: build/bin/ProcessGroupNCCLTest  
2025-12-04T09:05:04.0173093Z   inflating: build/bin/ProcessGroupNCCLErrorsTest  
2025-12-04T09:05:04.1133540Z   inflating: build/bin/test_api      
2025-12-04T09:05:04.1135930Z   inflating: build/bin/parallel_benchmark  
2025-12-04T09:05:04.1139402Z   inflating: build/bin/torch_shm_manager  
2025-12-04T09:05:04.1139875Z    creating: .additional_ci_files/
2025-12-04T09:05:04.1192007Z   inflating: .additional_ci_files/test-times.json  
2025-12-04T09:05:04.1382288Z   inflating: .additional_ci_files/test-class-times.json  
2025-12-04T09:05:04.1444359Z ##[group]Run rm artifacts.zip
2025-12-04T09:05:04.1444627Z [36;1mrm artifacts.zip[0m
2025-12-04T09:05:04.1455033Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:04.1455307Z env:
2025-12-04T09:05:04.1455662Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:04.1455860Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:04.1456092Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:04.1456340Z ##[endgroup]
2025-12-04T09:05:04.3048492Z ##[group]Run df -H
2025-12-04T09:05:04.3048688Z [36;1mdf -H[0m
2025-12-04T09:05:04.3055662Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:04.3055946Z env:
2025-12-04T09:05:04.3056110Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:04.3056315Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:04.3056550Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:04.3056811Z ##[endgroup]
2025-12-04T09:05:04.3103434Z Filesystem        Size  Used Avail Use% Mounted on
2025-12-04T09:05:04.3103813Z devtmpfs          4.2M     0  4.2M   0% /dev
2025-12-04T09:05:04.3104119Z tmpfs              33G     0   33G   0% /dev/shm
2025-12-04T09:05:04.3104429Z tmpfs              13G  775k   13G   1% /run
2025-12-04T09:05:04.3104725Z /dev/nvme0n1p1    161G   54G  108G  34% /
2025-12-04T09:05:04.3105030Z tmpfs              33G   17k   33G   1% /tmp
2025-12-04T09:05:04.3105335Z /dev/nvme0n1p128   11M  1.4M  9.2M  13% /boot/efi
2025-12-04T09:05:04.3105663Z tmpfs             6.5G     0  6.5G   0% /run/user/0
2025-12-04T09:05:04.3134502Z Prepare all required actions
2025-12-04T09:05:04.3135305Z Getting action download info
2025-12-04T09:05:04.5614668Z ##[group]Run ./.github/actions/download-td-artifacts
2025-12-04T09:05:04.5614932Z with:
2025-12-04T09:05:04.5615090Z env:
2025-12-04T09:05:04.5615241Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:04.5615429Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:04.5615653Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:04.5615898Z ##[endgroup]
2025-12-04T09:05:04.5737645Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:05:04.5737890Z with:
2025-12-04T09:05:04.5738043Z   name: td_results
2025-12-04T09:05:04.5738210Z   s3-bucket: gha-artifacts
2025-12-04T09:05:04.5738401Z   region: us-east-1
2025-12-04T09:05:04.5738573Z env:
2025-12-04T09:05:04.5738720Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:04.5738913Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:04.5739141Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:04.5739533Z ##[endgroup]
2025-12-04T09:05:04.9732633Z (node:58831) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:05:04.9733090Z 
2025-12-04T09:05:04.9733266Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:05:04.9733750Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:05:04.9734253Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:05:05.0669300Z Found 1 objects with prefix pytorch/pytorch/19922768520/td_results/
2025-12-04T09:05:05.0669898Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:05:05.1336919Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:05:05.1341772Z Artifact download has finished successfully
2025-12-04T09:05:05.1598564Z ##[group]Run mkdir -p .additional_ci_files
2025-12-04T09:05:05.1599031Z [36;1mmkdir -p .additional_ci_files[0m
2025-12-04T09:05:05.1599434Z [36;1mmv td_results.json .additional_ci_files/td_results.json || true[0m
2025-12-04T09:05:05.1608269Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:05.1608557Z env:
2025-12-04T09:05:05.1608746Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:05.1608957Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:05.1609210Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:05.1609474Z ##[endgroup]
2025-12-04T09:05:05.1724102Z ##[group]Run .github/scripts/parse_ref.py
2025-12-04T09:05:05.1724433Z [36;1m.github/scripts/parse_ref.py[0m
2025-12-04T09:05:05.1731411Z shell: /usr/bin/bash -e {0}
2025-12-04T09:05:05.1731610Z env:
2025-12-04T09:05:05.1731763Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:05.1731968Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:05.1732191Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:05.1732447Z ##[endgroup]
2025-12-04T09:05:05.1937922Z Setting output branch=main
2025-12-04T09:05:05.2034866Z Prepare all required actions
2025-12-04T09:05:05.2035193Z Getting action download info
2025-12-04T09:05:05.3588635Z ##[group]Run ./.github/actions/filter-test-configs
2025-12-04T09:05:05.3588913Z with:
2025-12-04T09:05:05.3589251Z   github-token: ***
2025-12-04T09:05:05.3596126Z   test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T09:05:05.3603691Z   job-name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:05.3604144Z env:
2025-12-04T09:05:05.3604312Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:05.3604517Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:05.3604756Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:05.3605035Z ##[endgroup]
2025-12-04T09:05:05.3687211Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:05:05.3687485Z with:
2025-12-04T09:05:05.3687689Z   shell: bash
2025-12-04T09:05:05.3687858Z   timeout_minutes: 10
2025-12-04T09:05:05.3688046Z   max_attempts: 5
2025-12-04T09:05:05.3688224Z   retry_wait_seconds: 30
2025-12-04T09:05:05.3689010Z   command: set -eux
# PyYAML 6.0 doesn't work with MacOS x86 anymore
# This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2
python3 -m pip install requests==2.27.1 pyyaml==6.0.2

2025-12-04T09:05:05.3689690Z   polling_interval_seconds: 1
2025-12-04T09:05:05.3689912Z   warning_on_retry: true
2025-12-04T09:05:05.3690114Z   continue_on_error: false
2025-12-04T09:05:05.3690303Z env:
2025-12-04T09:05:05.3690468Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:05.3690671Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:05.3690911Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:05.3691333Z   GITHUB_TOKEN: ***
2025-12-04T09:05:05.3691522Z ##[endgroup]
2025-12-04T09:05:05.4607249Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2
2025-12-04T09:05:05.6749096Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:05:05.8351713Z Collecting requests==2.27.1
2025-12-04T09:05:05.8501584Z   Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
2025-12-04T09:05:06.0374759Z Collecting pyyaml==6.0.2
2025-12-04T09:05:06.0412534Z   Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB)
2025-12-04T09:05:06.0640988Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10)
2025-12-04T09:05:06.0644442Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10)
2025-12-04T09:05:06.1097810Z Collecting certifi>=2017.4.17
2025-12-04T09:05:06.1134812Z   Downloading certifi-2025.11.12-py3-none-any.whl (159 kB)
2025-12-04T09:05:06.4713726Z Collecting charset-normalizer~=2.0.0
2025-12-04T09:05:06.4741842Z   Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
2025-12-04T09:05:06.5539777Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml
2025-12-04T09:05:06.6685684Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1
2025-12-04T09:05:07.4391809Z Command completed after 1 attempt(s).
2025-12-04T09:05:07.4461908Z ##[group]Run set -x
2025-12-04T09:05:07.4462109Z [36;1mset -x[0m
2025-12-04T09:05:07.4462270Z [36;1m[0m
2025-12-04T09:05:07.4462549Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:05:07.4462886Z [36;1m# in runner workspace[0m
2025-12-04T09:05:07.4463165Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py"[0m
2025-12-04T09:05:07.4471683Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:07.4471955Z env:
2025-12-04T09:05:07.4472117Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:07.4472312Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:07.4472532Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:07.4472974Z ##[endgroup]
2025-12-04T09:05:07.4499870Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py
2025-12-04T09:05:07.4665315Z Setting output branch=main
2025-12-04T09:05:07.4720386Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}"
2025-12-04T09:05:07.4720722Z [36;1mecho "Workflow: ${GITHUB_WORKFLOW}"[0m
2025-12-04T09:05:07.4720975Z [36;1mecho "Job name: ${JOB_NAME}"[0m
2025-12-04T09:05:07.4721197Z [36;1m[0m
2025-12-04T09:05:07.4721478Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:05:07.4721856Z [36;1m# in runner workspace[0m
2025-12-04T09:05:07.4722185Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \[0m
2025-12-04T09:05:07.4722542Z [36;1m  --workflow "${GITHUB_WORKFLOW}" \[0m
2025-12-04T09:05:07.4722805Z [36;1m  --job-name "${JOB_NAME}" \[0m
2025-12-04T09:05:07.4729972Z [36;1m  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" \[0m
2025-12-04T09:05:07.4737087Z [36;1m  --selected-test-configs "" \[0m
2025-12-04T09:05:07.4737349Z [36;1m  --pr-number "${PR_NUMBER}" \[0m
2025-12-04T09:05:07.4737719Z [36;1m  --tag "${TAG}" \[0m
2025-12-04T09:05:07.4737935Z [36;1m  --event-name "${EVENT_NAME}" \[0m
2025-12-04T09:05:07.4738183Z [36;1m  --schedule "${SCHEDULE}" \[0m
2025-12-04T09:05:07.4738425Z [36;1m  --branch "${HEAD_BRANCH}"[0m
2025-12-04T09:05:07.4746082Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:07.4746370Z env:
2025-12-04T09:05:07.4746538Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:07.4746729Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:07.4746978Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:07.4747443Z   GITHUB_TOKEN: ***
2025-12-04T09:05:07.4747892Z   JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:07.4748351Z   PR_NUMBER: 
2025-12-04T09:05:07.4748520Z   TAG: 
2025-12-04T09:05:07.4748678Z   EVENT_NAME: schedule
2025-12-04T09:05:07.4748863Z   SCHEDULE: 29 8 * * *
2025-12-04T09:05:07.4749046Z   HEAD_BRANCH: main
2025-12-04T09:05:07.4749222Z ##[endgroup]
2025-12-04T09:05:07.4773657Z Workflow: trunk
2025-12-04T09:05:07.4774518Z Job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:07.6643485Z Setting output keep-going=True
2025-12-04T09:05:07.6643987Z Setting output ci-verbose-test-logs=False
2025-12-04T09:05:07.6644506Z Setting output ci-test-showlocals=False
2025-12-04T09:05:07.6644913Z Setting output ci-no-test-timeout=False
2025-12-04T09:05:07.6645190Z Setting output ci-no-td=False
2025-12-04T09:05:07.6645470Z Setting output ci-td-distributed=False
2025-12-04T09:05:07.6645758Z Setting output is-unstable=False
2025-12-04T09:05:07.6646016Z Setting output reenabled-issues=
2025-12-04T09:05:07.6662471Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}
2025-12-04T09:05:07.6678079Z Setting output is-test-matrix-empty=False
2025-12-04T09:05:07.6745422Z ##[group]Run echo "Filtered matrix:"
2025-12-04T09:05:07.6745724Z [36;1mecho "Filtered matrix:"[0m
2025-12-04T09:05:07.6761385Z [36;1mecho "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}"[0m
2025-12-04T09:05:07.6777053Z [36;1m[0m
2025-12-04T09:05:07.6777218Z [36;1mecho[0m
2025-12-04T09:05:07.6777429Z [36;1mecho "Is the current job unstable? False"[0m
2025-12-04T09:05:07.6777678Z [36;1m[0m
2025-12-04T09:05:07.6777835Z [36;1mecho[0m
2025-12-04T09:05:07.6778027Z [36;1mecho "Is keep-going label set? True"[0m
2025-12-04T09:05:07.6778259Z [36;1m[0m
2025-12-04T09:05:07.6778423Z [36;1mecho[0m
2025-12-04T09:05:07.6778606Z [36;1mecho "Reenabled issues? "[0m
2025-12-04T09:05:07.6785943Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:07.6786240Z env:
2025-12-04T09:05:07.6786509Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:07.6786707Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:07.6786945Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:07.6787207Z ##[endgroup]
2025-12-04T09:05:07.6812789Z Filtered matrix:
2025-12-04T09:05:07.6833916Z {include: [{config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}]}
2025-12-04T09:05:07.6849456Z 
2025-12-04T09:05:07.6849556Z Is the current job unstable? False
2025-12-04T09:05:07.6849710Z 
2025-12-04T09:05:07.6849799Z Is keep-going label set? True
2025-12-04T09:05:07.6849935Z 
2025-12-04T09:05:07.6850003Z Reenabled issues? 
2025-12-04T09:05:07.6876520Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"
2025-12-04T09:05:07.6876949Z [36;1mecho "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:05:07.6883991Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:07.6884275Z env:
2025-12-04T09:05:07.6884435Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:07.6884629Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:07.6884861Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:07.6885124Z   JOB_TIMEOUT: 600
2025-12-04T09:05:07.6885294Z ##[endgroup]
2025-12-04T09:05:07.6939110Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:05:07.6939545Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:05:07.6939883Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:05:07.6946646Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:05:07.6946915Z env:
2025-12-04T09:05:07.6947097Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:07.6947291Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:07.6947526Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:07.6947788Z ##[endgroup]
2025-12-04T09:05:07.7042281Z ##[group]Run set -x
2025-12-04T09:05:07.7042556Z [36;1mset -x[0m
2025-12-04T09:05:07.7042740Z [36;1m[0m
2025-12-04T09:05:07.7042933Z [36;1mif [[ $TEST_CONFIG == 'multigpu' ]]; then[0m
2025-12-04T09:05:07.7043218Z [36;1m  TEST_COMMAND=.ci/pytorch/multigpu-test.sh[0m
2025-12-04T09:05:07.7043675Z [36;1melif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then[0m
2025-12-04T09:05:07.7043938Z [36;1m  TEST_COMMAND=.ci/onnx/test.sh[0m
2025-12-04T09:05:07.7044153Z [36;1melse[0m
2025-12-04T09:05:07.7044344Z [36;1m  TEST_COMMAND=.ci/pytorch/test.sh[0m
2025-12-04T09:05:07.7044564Z [36;1mfi[0m
2025-12-04T09:05:07.7044708Z [36;1m[0m
2025-12-04T09:05:07.7044887Z [36;1m# Leaving 1GB for the runner and other things[0m
2025-12-04T09:05:07.7045313Z [36;1mTOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo)[0m
2025-12-04T09:05:07.7045960Z [36;1m# https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap[0m
2025-12-04T09:05:07.7046460Z [36;1m# comes from https://github.com/pytorch/test-infra/pull/6058[0m
2025-12-04T09:05:07.7046851Z [36;1mTOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3))[0m
2025-12-04T09:05:07.7047148Z [36;1m[0m
2025-12-04T09:05:07.7047335Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:05:07.7047578Z [36;1m  SHM_OPTS=[0m
2025-12-04T09:05:07.7047754Z [36;1m  JENKINS_USER=[0m
2025-12-04T09:05:07.7048001Z [36;1m  # ensure that docker container cleanly exits in 12 hours[0m
2025-12-04T09:05:07.7048329Z [36;1m  # if for some reason cleanup action doesn't stop container[0m
2025-12-04T09:05:07.7048606Z [36;1m  # when job is cancelled[0m
2025-12-04T09:05:07.7048819Z [36;1m  DOCKER_SHELL_CMD="sleep 12h"[0m
2025-12-04T09:05:07.7049046Z [36;1m  USED_IMAGE="${DOCKER_IMAGE_S390X}"[0m
2025-12-04T09:05:07.7049261Z [36;1melse[0m
2025-12-04T09:05:07.7049446Z [36;1m  SHM_OPTS="--shm-size=${SHM_SIZE}"[0m
2025-12-04T09:05:07.7049693Z [36;1m  JENKINS_USER="--user jenkins"[0m
2025-12-04T09:05:07.7049912Z [36;1m  DOCKER_SHELL_CMD=[0m
2025-12-04T09:05:07.7050114Z [36;1m  USED_IMAGE="${DOCKER_IMAGE}"[0m
2025-12-04T09:05:07.7050315Z [36;1mfi[0m
2025-12-04T09:05:07.7050453Z [36;1m[0m
2025-12-04T09:05:07.7050690Z [36;1m# detached container should get cleaned up by teardown_ec2_linux[0m
2025-12-04T09:05:07.7051061Z [36;1m# TODO: Stop building test binaries as part of the build phase[0m
2025-12-04T09:05:07.7051478Z [36;1m# Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice[0m
2025-12-04T09:05:07.7051849Z [36;1m# shellcheck disable=SC2086,SC2090[0m
2025-12-04T09:05:07.7052081Z [36;1mcontainer_name=$(docker run \[0m
2025-12-04T09:05:07.7052301Z [36;1m  ${GPU_FLAG:-} \[0m
2025-12-04T09:05:07.7052510Z [36;1m  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \[0m
2025-12-04T09:05:07.7052754Z [36;1m  -e BUILD_ENVIRONMENT \[0m
2025-12-04T09:05:07.7052962Z [36;1m  -e PR_NUMBER \[0m
2025-12-04T09:05:07.7053152Z [36;1m  -e GITHUB_ACTIONS \[0m
2025-12-04T09:05:07.7053354Z [36;1m  -e GITHUB_REPOSITORY \[0m
2025-12-04T09:05:07.7053565Z [36;1m  -e GITHUB_WORKFLOW \[0m
2025-12-04T09:05:07.7053761Z [36;1m  -e GITHUB_JOB \[0m
2025-12-04T09:05:07.7053951Z [36;1m  -e GITHUB_RUN_ID \[0m
2025-12-04T09:05:07.7054153Z [36;1m  -e GITHUB_RUN_NUMBER \[0m
2025-12-04T09:05:07.7054360Z [36;1m  -e GITHUB_RUN_ATTEMPT \[0m
2025-12-04T09:05:07.7054570Z [36;1m  -e JOB_ID \[0m
2025-12-04T09:05:07.7054758Z [36;1m  -e JOB_NAME \[0m
2025-12-04T09:05:07.7054940Z [36;1m  -e BASE_SHA \[0m
2025-12-04T09:05:07.7055109Z [36;1m  -e BRANCH \[0m
2025-12-04T09:05:07.7055285Z [36;1m  -e SHA1 \[0m
2025-12-04T09:05:07.7055463Z [36;1m  -e AWS_DEFAULT_REGION \[0m
2025-12-04T09:05:07.7055661Z [36;1m  -e IN_WHEEL_TEST \[0m
2025-12-04T09:05:07.7055854Z [36;1m  -e SHARD_NUMBER \[0m
2025-12-04T09:05:07.7056050Z [36;1m  -e TEST_CONFIG \[0m
2025-12-04T09:05:07.7056239Z [36;1m  -e NUM_TEST_SHARDS \[0m
2025-12-04T09:05:07.7056561Z [36;1m  -e REENABLED_ISSUES \[0m
2025-12-04T09:05:07.7056781Z [36;1m  -e CONTINUE_THROUGH_ERROR \[0m
2025-12-04T09:05:07.7057011Z [36;1m  -e VERBOSE_TEST_LOGS \[0m
2025-12-04T09:05:07.7057208Z [36;1m  -e TEST_SHOWLOCALS \[0m
2025-12-04T09:05:07.7057474Z [36;1m  -e NO_TEST_TIMEOUT \[0m
2025-12-04T09:05:07.7057663Z [36;1m  -e NO_TD \[0m
2025-12-04T09:05:07.7057837Z [36;1m  -e TD_DISTRIBUTED \[0m
2025-12-04T09:05:07.7058034Z [36;1m  -e PR_LABELS \[0m
2025-12-04T09:05:07.7058240Z [36;1m  -e MAX_JOBS="$(nproc --ignore=2)" \[0m
2025-12-04T09:05:07.7058467Z [36;1m  -e SCCACHE_BUCKET \[0m
2025-12-04T09:05:07.7058659Z [36;1m  -e SCCACHE_REGION \[0m
2025-12-04T09:05:07.7058845Z [36;1m  -e XLA_CUDA \[0m
2025-12-04T09:05:07.7059042Z [36;1m  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \[0m
2025-12-04T09:05:07.7059292Z [36;1m  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \[0m
2025-12-04T09:05:07.7059551Z [36;1m  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \[0m
2025-12-04T09:05:07.7059822Z [36;1m  -e SKIP_SCCACHE_INITIALIZATION=1 \[0m
2025-12-04T09:05:07.7060055Z [36;1m  -e HUGGING_FACE_HUB_TOKEN \[0m
2025-12-04T09:05:07.7060284Z [36;1m  -e VLLM_TEST_HUGGING_FACE_TOKEN \[0m
2025-12-04T09:05:07.7060521Z [36;1m  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \[0m
2025-12-04T09:05:07.7060741Z [36;1m  -e DASHBOARD_TAG \[0m
2025-12-04T09:05:07.7060941Z [36;1m  -e ARTIFACTS_FILE_SUFFIX \[0m
2025-12-04T09:05:07.7061192Z [36;1m  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \[0m
2025-12-04T09:05:07.7061483Z [36;1m  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \[0m
2025-12-04T09:05:07.7061767Z [36;1m  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \[0m
2025-12-04T09:05:07.7062039Z [36;1m  --security-opt seccomp=unconfined \[0m
2025-12-04T09:05:07.7062281Z [36;1m  --cap-add=SYS_PTRACE \[0m
2025-12-04T09:05:07.7062479Z [36;1m  --ipc=host \[0m
2025-12-04T09:05:07.7062660Z [36;1m  ${SHM_OPTS} \[0m
2025-12-04T09:05:07.7062835Z [36;1m  --tty \[0m
2025-12-04T09:05:07.7062997Z [36;1m  --detach \[0m
2025-12-04T09:05:07.7063186Z [36;1m  --name="${container_name}" \[0m
2025-12-04T09:05:07.7063410Z [36;1m  ${JENKINS_USER} \[0m
2025-12-04T09:05:07.7063651Z [36;1m  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \[0m
2025-12-04T09:05:07.7063929Z [36;1m  -w /var/lib/jenkins/workspace \[0m
2025-12-04T09:05:07.7064154Z [36;1m  "${USED_IMAGE}" \[0m
2025-12-04T09:05:07.7064345Z [36;1m  ${DOCKER_SHELL_CMD}[0m
2025-12-04T09:05:07.7064521Z [36;1m)[0m
2025-12-04T09:05:07.7064755Z [36;1mecho "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"[0m
2025-12-04T09:05:07.7065039Z [36;1m[0m
2025-12-04T09:05:07.7065224Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:05:07.7065648Z [36;1m  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt"[0m
2025-12-04T09:05:07.7066011Z [36;1mfi[0m
2025-12-04T09:05:07.7066171Z [36;1m[0m
2025-12-04T09:05:07.7066512Z [36;1mdocker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}"[0m
2025-12-04T09:05:07.7073360Z shell: /usr/bin/bash -e {0}
2025-12-04T09:05:07.7073566Z env:
2025-12-04T09:05:07.7073715Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:05:07.7073913Z   HAS_NVIDIA_GPU: true
2025-12-04T09:05:07.7074152Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:07.7074474Z   BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:05:07.7074725Z   PR_NUMBER: 
2025-12-04T09:05:07.7074902Z   GITHUB_REPOSITORY: pytorch/pytorch
2025-12-04T09:05:07.7075122Z   GITHUB_WORKFLOW: trunk
2025-12-04T09:05:07.7075295Z   GITHUB_JOB: test
2025-12-04T09:05:07.7075478Z   GITHUB_RUN_ID: 19922768520
2025-12-04T09:05:07.7075677Z   GITHUB_RUN_NUMBER: 158165
2025-12-04T09:05:07.7075863Z   GITHUB_RUN_ATTEMPT: 1
2025-12-04T09:05:07.7076037Z   JOB_ID: 57116084862
2025-12-04T09:05:07.7076453Z   JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:07.7077007Z   BRANCH: main
2025-12-04T09:05:07.7077203Z   SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:07.7077473Z   BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:07.7077717Z   TEST_CONFIG: default
2025-12-04T09:05:07.7077881Z   SHARD_NUMBER: 2
2025-12-04T09:05:07.7078115Z   NUM_TEST_SHARDS: 5
2025-12-04T09:05:07.7078279Z   EXTRA_FLAGS: 
2025-12-04T09:05:07.7078445Z   OP_BENCHMARK_TESTS: 
2025-12-04T09:05:07.7078619Z   REENABLED_ISSUES: 
2025-12-04T09:05:07.7078794Z   CONTINUE_THROUGH_ERROR: True
2025-12-04T09:05:07.7078988Z   VERBOSE_TEST_LOGS: False
2025-12-04T09:05:07.7079173Z   TEST_SHOWLOCALS: False
2025-12-04T09:05:07.7079351Z   NO_TEST_TIMEOUT: False
2025-12-04T09:05:07.7079517Z   NO_TD: False
2025-12-04T09:05:07.7079670Z   TD_DISTRIBUTED: False
2025-12-04T09:05:07.7079984Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2025-12-04T09:05:07.7080245Z   SCCACHE_REGION: us-east-1
2025-12-04T09:05:07.7080418Z   SHM_SIZE: 2g
2025-12-04T09:05:07.7080972Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:07.7081957Z   DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:07.7082559Z   XLA_CUDA: 
2025-12-04T09:05:07.7082803Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:05:07.7083137Z   PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1
2025-12-04T09:05:07.7083374Z   PYTORCH_TEST_RERUN_DISABLED_TESTS: 0
2025-12-04T09:05:07.7083579Z   DASHBOARD_TAG: 
2025-12-04T09:05:07.7083915Z   VLLM_TEST_HUGGING_FACE_TOKEN: ***
2025-12-04T09:05:07.7084224Z   HUGGING_FACE_HUB_TOKEN: ***
2025-12-04T09:05:07.7084516Z   SCRIBE_GRAPHQL_ACCESS_TOKEN: ***
2025-12-04T09:05:07.7084909Z   ARTIFACTS_FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T09:05:07.7085292Z ##[endgroup]
2025-12-04T09:05:07.7114020Z + [[ default == \m\u\l\t\i\g\p\u ]]
2025-12-04T09:05:07.7114339Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *onnx* ]]
2025-12-04T09:05:07.7114615Z + TEST_COMMAND=.ci/pytorch/test.sh
2025-12-04T09:05:07.7117490Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo
2025-12-04T09:05:07.7138672Z + TOTAL_AVAILABLE_MEMORY_IN_GB='59.453 '
2025-12-04T09:05:07.7139193Z + TOTAL_MEMORY_WITH_SWAP=62
2025-12-04T09:05:07.7139530Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:05:07.7139860Z + SHM_OPTS=--shm-size=2g
2025-12-04T09:05:07.7140109Z + JENKINS_USER='--user jenkins'
2025-12-04T09:05:07.7140349Z + DOCKER_SHELL_CMD=
2025-12-04T09:05:07.7141057Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:07.7147533Z +++ nproc --ignore=2
2025-12-04T09:05:07.7180560Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=59g --memory-swap=62g --env-file=/tmp/github_env_19922768520 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:05:16.8681044Z + container_name=e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T09:05:16.8682168Z + echo DOCKER_CONTAINER_ID=e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T09:05:16.8682723Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:05:16.8685620Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl
2025-12-04T09:05:16.8688207Z + docker exec -t e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh'
2025-12-04T09:05:17.2744254Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f)
2025-12-04T09:05:17.6297801Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0)
2025-12-04T09:05:17.6300838Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2)
2025-12-04T09:05:17.6304807Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3)
2025-12-04T09:05:17.6309072Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8)
2025-12-04T09:05:17.6312347Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6)
2025-12-04T09:05:17.6316627Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0)
2025-12-04T09:05:17.6329691Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0)
2025-12-04T09:05:17.6660890Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4)
2025-12-04T09:05:17.6678385Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0)
2025-12-04T09:05:17.6731006Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3)
2025-12-04T09:05:18.0302670Z Installing collected packages: torch
2025-12-04T09:05:28.8959827Z Successfully installed torch-2.10.0a0+gitffd9b0f
2025-12-04T09:05:28.9471841Z + export TERM=vt100
2025-12-04T09:05:28.9472168Z + TERM=vt100
2025-12-04T09:05:28.9473917Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:05:28.9483441Z + source .ci/pytorch/common.sh
2025-12-04T09:05:28.9487460Z +++ dirname .ci/pytorch/common.sh
2025-12-04T09:05:28.9496161Z ++ source .ci/pytorch/common_utils.sh
2025-12-04T09:05:28.9497422Z +++ declare -f -t trap_add
2025-12-04T09:05:28.9502595Z ++ set -ex -o pipefail
2025-12-04T09:05:28.9502973Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:05:28.9503279Z ++ BUILD_TEST_LIBTORCH=0
2025-12-04T09:05:28.9507002Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:05:28.9513990Z + source .ci/pytorch/common-build.sh
2025-12-04T09:05:28.9515691Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 != *win-* ]]
2025-12-04T09:05:28.9521570Z ++++ dirname .ci/pytorch/common-build.sh
2025-12-04T09:05:28.9529719Z +++ cd .ci/pytorch
2025-12-04T09:05:28.9529967Z +++ pwd -P
2025-12-04T09:05:28.9531933Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch
2025-12-04T09:05:28.9532343Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-pch* ]]
2025-12-04T09:05:28.9532646Z ++ which sccache
2025-12-04T09:05:28.9545439Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]]
2025-12-04T09:05:28.9547455Z ++ sccache --stop-server
2025-12-04T09:05:28.9574665Z ++ true
2025-12-04T09:05:28.9574908Z ++ rm -f /var/lib/jenkins/sccache_error.log
2025-12-04T09:05:28.9586122Z ++ trap_add sccache_epilogue EXIT
2025-12-04T09:05:28.9586411Z ++ trap_add_cmd=sccache_epilogue
2025-12-04T09:05:28.9586656Z ++ shift
2025-12-04T09:05:28.9586853Z ++ for trap_add_name in "$@"
2025-12-04T09:05:28.9592884Z ++++ trap -p EXIT
2025-12-04T09:05:28.9595998Z +++ eval 'extract_trap_cmd '
2025-12-04T09:05:28.9596247Z ++++ extract_trap_cmd
2025-12-04T09:05:28.9596634Z ++++ printf '%s\n' ''
2025-12-04T09:05:28.9596893Z +++ printf '%s\n' sccache_epilogue
2025-12-04T09:05:28.9599099Z ++ trap -- '
2025-12-04T09:05:28.9599307Z sccache_epilogue' EXIT
2025-12-04T09:05:28.9599519Z ++ [[ -n 1 ]]
2025-12-04T09:05:28.9599869Z ++ echo 'Skipping sccache server initialization, setting environment variables'
2025-12-04T09:05:28.9600498Z Skipping sccache server initialization, setting environment variables
2025-12-04T09:05:28.9600899Z ++ export SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:05:28.9601154Z ++ SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:05:28.9601448Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:05:28.9601829Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:05:28.9607982Z ++ export RUST_LOG=sccache::server=error
2025-12-04T09:05:28.9608279Z ++ RUST_LOG=sccache::server=error
2025-12-04T09:05:28.9608498Z ++ sccache --zero-stats
2025-12-04T09:05:29.1293726Z Statistics zeroed.
2025-12-04T09:05:29.1298714Z ++ which ccache
2025-12-04T09:05:29.1311775Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *rocm* ]]
2025-12-04T09:05:29.1312265Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *s390x* ]]
2025-12-04T09:05:29.1312645Z + [[ -d /var/lib/jenkins/workspace ]]
2025-12-04T09:05:29.1314811Z ++ stat -c %u /var/lib/jenkins/workspace
2025-12-04T09:05:29.1332016Z + WORKSPACE_ORIGINAL_OWNER_ID=1000
2025-12-04T09:05:29.1342796Z + trap_add cleanup_workspace EXIT
2025-12-04T09:05:29.1343156Z + trap_add_cmd=cleanup_workspace
2025-12-04T09:05:29.1343378Z + shift
2025-12-04T09:05:29.1343553Z + for trap_add_name in "$@"
2025-12-04T09:05:29.1343766Z +++ trap -p EXIT
2025-12-04T09:05:29.1343951Z ++ eval 'extract_trap_cmd trap -- '\''
2025-12-04T09:05:29.1344179Z sccache_epilogue'\'' EXIT'
2025-12-04T09:05:29.1344379Z +++ extract_trap_cmd trap -- '
2025-12-04T09:05:29.1344573Z sccache_epilogue' EXIT
2025-12-04T09:05:29.1344747Z +++ printf '%s\n' '
2025-12-04T09:05:29.1344914Z sccache_epilogue'
2025-12-04T09:05:29.1345091Z ++ printf '%s\n' cleanup_workspace
2025-12-04T09:05:29.1345400Z + trap -- '
2025-12-04T09:05:29.1345642Z sccache_epilogue
2025-12-04T09:05:29.1345839Z cleanup_workspace' EXIT
2025-12-04T09:05:29.1346072Z + sudo chown -R jenkins /var/lib/jenkins/workspace
2025-12-04T09:05:30.0563731Z + git config --global --add safe.directory /var/lib/jenkins/workspace
2025-12-04T09:05:30.0581669Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:05:30.0584769Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))'
2025-12-04T09:05:30.4261949Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:05:30.4262593Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']'
2025-12-04T09:05:30.4268096Z +++ realpath .ci/pytorch/test.sh
2025-12-04T09:05:30.4278546Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh
2025-12-04T09:05:30.4286513Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch
2025-12-04T09:05:30.4286985Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:05:30.4287763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace
2025-12-04T09:05:30.4288116Z + patch -p4
2025-12-04T09:05:30.4302491Z patching file cudadrv/driver.py
2025-12-04T09:05:30.4302797Z Hunk #1 succeeded at 357 (offset -8 lines).
2025-12-04T09:05:30.4358086Z + popd
2025-12-04T09:05:30.4359140Z ~/workspace
2025-12-04T09:05:30.4359741Z + echo 'Environment variables:'
2025-12-04T09:05:30.4360656Z Environment variables:
2025-12-04T09:05:30.4361242Z + env
2025-12-04T09:05:30.4375321Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:05:30.4375906Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:05:30.4376457Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:05:30.4377197Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:05:30.4377593Z HOSTNAME=e29498c26bf7
2025-12-04T09:05:30.4378337Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4379332Z GITHUB_ACTION=__run_3
2025-12-04T09:05:30.4379757Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:05:30.4380238Z GITHUB_RUN_NUMBER=158165
2025-12-04T09:05:30.4380615Z TEST_CONFIG=default
2025-12-04T09:05:30.4380958Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:05:30.4381250Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:05:30.4381547Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:05:30.4381939Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:05:30.4382214Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:05:30.4382455Z GITHUB_REF_TYPE=branch
2025-12-04T09:05:30.4382733Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4383036Z XLA_CUDA=
2025-12-04T09:05:30.4383340Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:05:30.4383810Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:05:30.4384368Z ***
2025-12-04T09:05:30.4384565Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:05:30.4384832Z GITHUB_ACTIONS=true
2025-12-04T09:05:30.4385051Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:30.4385349Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:05:30.4385712Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4386045Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4386449Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main
2025-12-04T09:05:30.4386794Z UCC_HOME=/usr
2025-12-04T09:05:30.4386949Z VERBOSE_TEST_LOGS=False
2025-12-04T09:05:30.4387134Z GITHUB_REF=refs/heads/main
2025-12-04T09:05:30.4387313Z SHARD_NUMBER=2
2025-12-04T09:05:30.4387470Z GITHUB_REF_PROTECTED=true
2025-12-04T09:05:30.4387663Z HOME=/var/lib/jenkins
2025-12-04T09:05:30.4387869Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:05:30.4388107Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:05:30.4388350Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:05:30.4388587Z USE_SYSTEM_NCCL=1
2025-12-04T09:05:30.4388740Z NUM_TEST_SHARDS=5
2025-12-04T09:05:30.4388899Z UCX_HOME=/usr
2025-12-04T09:05:30.4389318Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4390020Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:30.4390687Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4391258Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:05:30.4391615Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:05:30.4391801Z DASHBOARD_TAG=
2025-12-04T09:05:30.4391964Z GITHUB_RUN_ID=19922768520
2025-12-04T09:05:30.4392144Z INSTALLED_OPENBLAS=
2025-12-04T09:05:30.4392561Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4393029Z GITHUB_ACTOR=huydhn
2025-12-04T09:05:30.4393183Z PR_NUMBER=
2025-12-04T09:05:30.4393329Z DESIRED_CUDA=12.8.1
2025-12-04T09:05:30.4393489Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:05:30.4393878Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:05:30.4394126Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:05:30.4394365Z TERM=vt100
2025-12-04T09:05:30.4394513Z INSTALLED_VISION=yes
2025-12-04T09:05:30.4394677Z BRANCH=main
2025-12-04T09:05:30.4394933Z SCCACHE_REGION=us-east-1
2025-12-04T09:05:30.4395125Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:05:30.4395331Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:05:30.4395509Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:05:30.4395889Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:05:30.4396297Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:05:30.4396539Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:05:30.4396776Z REENABLED_ISSUES=
2025-12-04T09:05:30.4396927Z DOCS=
2025-12-04T09:05:30.4397074Z SHLVL=1
2025-12-04T09:05:30.4397208Z MAX_JOBS=14
2025-12-04T09:05:30.4397359Z GITHUB_ACTOR_ID=475357
2025-12-04T09:05:30.4397601Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4397866Z GITHUB_REF_NAME=main
2025-12-04T09:05:30.4398131Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:05:30.4398437Z GITHUB_JOB=test
2025-12-04T09:05:30.4398595Z NO_TEST_TIMEOUT=False
2025-12-04T09:05:30.4398770Z TD_DISTRIBUTED=False
2025-12-04T09:05:30.4398953Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:05:30.4399155Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:05:30.4399337Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:05:30.4399525Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:05:30.4400187Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:05:30.4400751Z GITHUB_BASE_REF=
2025-12-04T09:05:30.4400915Z INSTALLED_ACL=
2025-12-04T09:05:30.4401251Z ARTIFACTS_FILE_SUFFIX=test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T09:05:30.4401624Z CI=true
2025-12-04T09:05:30.4401787Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:05:30.4402023Z RUST_LOG=sccache::server=error
2025-12-04T09:05:30.4402204Z JOB_ID=57116084862
2025-12-04T09:05:30.4402361Z GITHUB_HEAD_REF=
2025-12-04T09:05:30.4402523Z GITHUB_ACTION_REF=
2025-12-04T09:05:30.4402717Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:05:30.4402968Z TEST_SHOWLOCALS=False
2025-12-04T09:05:30.4403140Z GITHUB_WORKFLOW=trunk
2025-12-04T09:05:30.4403309Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:05:30.4403746Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4404186Z NO_TD=False
2025-12-04T09:05:30.4404354Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:05:30.4404563Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:05:30.4404780Z _=/usr/bin/env
2025-12-04T09:05:30.4405028Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:05:30.4405398Z ++ python -c 'import site; print(site.getsitepackages()[0])'
2025-12-04T09:05:30.4504803Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch
2025-12-04T09:05:30.4505803Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin
2025-12-04T09:05:30.4506616Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib
2025-12-04T09:05:30.4507251Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test
2025-12-04T09:05:30.4507726Z + BUILD_DIR=build
2025-12-04T09:05:30.4508034Z + BUILD_RENAMED_DIR=build_renamed
2025-12-04T09:05:30.4508380Z + BUILD_BIN_DIR=build/bin
2025-12-04T09:05:30.4508682Z + SHARD_NUMBER=2
2025-12-04T09:05:30.4508965Z + NUM_TEST_SHARDS=5
2025-12-04T09:05:30.4509273Z + export TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:05:30.4509657Z + TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:05:30.4510170Z + export VALGRIND=ON
2025-12-04T09:05:30.4510424Z + VALGRIND=ON
2025-12-04T09:05:30.4510640Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *clang9* ]]
2025-12-04T09:05:30.4511121Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:05:30.4511378Z + detect_cuda_arch
2025-12-04T09:05:30.4511574Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:05:30.4511838Z + command -v nvidia-smi
2025-12-04T09:05:30.4512158Z /usr/bin/nvidia-smi
2025-12-04T09:05:30.4516534Z ++ nvidia-smi --query-gpu=compute_cap --format=csv
2025-12-04T09:05:30.4517583Z ++ tail -n 1
2025-12-04T09:05:30.4716077Z + TORCH_CUDA_ARCH_LIST=8.9
2025-12-04T09:05:30.4716516Z + export TORCH_CUDA_ARCH_LIST
2025-12-04T09:05:30.4716930Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *s390x* ]]
2025-12-04T09:05:30.4717556Z + [[ 0 == \1 ]]
2025-12-04T09:05:30.4717824Z + [[ True == \1 ]]
2025-12-04T09:05:30.4718048Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *bazel* ]]
2025-12-04T09:05:30.4721274Z ++ realpath build/custom_test_artifacts
2025-12-04T09:05:30.4730552Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts
2025-12-04T09:05:30.4731102Z + [[ -n '' ]]
2025-12-04T09:05:30.4731292Z + echo 'Environment variables'
2025-12-04T09:05:30.4731510Z Environment variables
2025-12-04T09:05:30.4731687Z + env
2025-12-04T09:05:30.4737618Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:05:30.4737972Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:05:30.4738369Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T09:05:30.4739044Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:05:30.4739412Z HOSTNAME=e29498c26bf7
2025-12-04T09:05:30.4740181Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4740637Z GITHUB_ACTION=__run_3
2025-12-04T09:05:30.4740948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:05:30.4741327Z GITHUB_RUN_NUMBER=158165
2025-12-04T09:05:30.4741658Z TEST_CONFIG=default
2025-12-04T09:05:30.4741977Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:05:30.4742383Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:05:30.4742792Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:05:30.4743315Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:05:30.4743688Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:05:30.4744044Z GITHUB_REF_TYPE=branch
2025-12-04T09:05:30.4744359Z TORCH_CUDA_ARCH_LIST=8.9
2025-12-04T09:05:30.4744754Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4745174Z XLA_CUDA=
2025-12-04T09:05:30.4745456Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:05:30.4746234Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:05:30.4746650Z ***
2025-12-04T09:05:30.4746920Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:05:30.4747281Z GITHUB_ACTIONS=true
2025-12-04T09:05:30.4747593Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:05:30.4748022Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:05:30.4748513Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4748986Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4749627Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main
2025-12-04T09:05:30.4750208Z UCC_HOME=/usr
2025-12-04T09:05:30.4750497Z TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:05:30.4750826Z VERBOSE_TEST_LOGS=False
2025-12-04T09:05:30.4751146Z GITHUB_REF=refs/heads/main
2025-12-04T09:05:30.4751461Z SHARD_NUMBER=2
2025-12-04T09:05:30.4751734Z GITHUB_REF_PROTECTED=true
2025-12-04T09:05:30.4752044Z HOME=/var/lib/jenkins
2025-12-04T09:05:30.4752388Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:05:30.4752787Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:05:30.4753228Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:05:30.4753664Z USE_SYSTEM_NCCL=1
2025-12-04T09:05:30.4753851Z NUM_TEST_SHARDS=5
2025-12-04T09:05:30.4753998Z UCX_HOME=/usr
2025-12-04T09:05:30.4754424Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4755124Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T09:05:30.4755990Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4756572Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:05:30.4757057Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:05:30.4757256Z DASHBOARD_TAG=
2025-12-04T09:05:30.4757414Z GITHUB_RUN_ID=19922768520
2025-12-04T09:05:30.4757598Z INSTALLED_OPENBLAS=
2025-12-04T09:05:30.4758038Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4758499Z GITHUB_ACTOR=huydhn
2025-12-04T09:05:30.4758657Z PR_NUMBER=
2025-12-04T09:05:30.4758804Z DESIRED_CUDA=12.8.1
2025-12-04T09:05:30.4758955Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:05:30.4759117Z VALGRIND=ON
2025-12-04T09:05:30.4759273Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:05:30.4759505Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:05:30.4759751Z TERM=vt100
2025-12-04T09:05:30.4760032Z INSTALLED_VISION=yes
2025-12-04T09:05:30.4760204Z BRANCH=main
2025-12-04T09:05:30.4760358Z SCCACHE_REGION=us-east-1
2025-12-04T09:05:30.4760544Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:05:30.4760736Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:05:30.4760922Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:05:30.4761290Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:05:30.4761694Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:05:30.4761951Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:05:30.4762193Z REENABLED_ISSUES=
2025-12-04T09:05:30.4762342Z DOCS=
2025-12-04T09:05:30.4762478Z SHLVL=1
2025-12-04T09:05:30.4762615Z MAX_JOBS=14
2025-12-04T09:05:30.4762758Z GITHUB_ACTOR_ID=475357
2025-12-04T09:05:30.4762997Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:05:30.4763267Z GITHUB_REF_NAME=main
2025-12-04T09:05:30.4763529Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:05:30.4763831Z GITHUB_JOB=test
2025-12-04T09:05:30.4763992Z NO_TEST_TIMEOUT=False
2025-12-04T09:05:30.4764165Z TD_DISTRIBUTED=False
2025-12-04T09:05:30.4764356Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:05:30.4764569Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:05:30.4764749Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:05:30.4764937Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:05:30.4765496Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:05:30.4766053Z GITHUB_BASE_REF=
2025-12-04T09:05:30.4766211Z INSTALLED_ACL=
2025-12-04T09:05:30.4766550Z ARTIFACTS_FILE_SUFFIX=test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T09:05:30.4766925Z CI=true
2025-12-04T09:05:30.4767071Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:05:30.4767302Z RUST_LOG=sccache::server=error
2025-12-04T09:05:30.4767486Z JOB_ID=57116084862
2025-12-04T09:05:30.4767661Z GITHUB_HEAD_REF=
2025-12-04T09:05:30.4767812Z GITHUB_ACTION_REF=
2025-12-04T09:05:30.4768026Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:05:30.4768272Z TEST_SHOWLOCALS=False
2025-12-04T09:05:30.4768436Z GITHUB_WORKFLOW=trunk
2025-12-04T09:05:30.4768613Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:05:30.4769045Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_68cc5a94-c390-4b58-b946-14203b5a4e58
2025-12-04T09:05:30.4769483Z NO_TD=False
2025-12-04T09:05:30.4769647Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:05:30.4769861Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:05:30.4770175Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:05:30.4770462Z _=/usr/bin/env
2025-12-04T09:05:30.4770619Z + echo 'Testing pytorch'
2025-12-04T09:05:30.4770798Z Testing pytorch
2025-12-04T09:05:30.4770958Z + export LANG=C.UTF-8
2025-12-04T09:05:30.4771123Z + LANG=C.UTF-8
2025-12-04T09:05:30.4771401Z + PR_NUMBER=
2025-12-04T09:05:30.4771565Z + [[ default == \d\e\f\a\u\l\t ]]
2025-12-04T09:05:30.4771774Z + export CUDA_VISIBLE_DEVICES=0
2025-12-04T09:05:30.4771972Z + CUDA_VISIBLE_DEVICES=0
2025-12-04T09:05:30.4772154Z + export HIP_VISIBLE_DEVICES=0
2025-12-04T09:05:30.4772421Z + HIP_VISIBLE_DEVICES=0
2025-12-04T09:05:30.4772603Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]]
2025-12-04T09:05:30.4772808Z + [[ default == \s\l\o\w ]]
2025-12-04T09:05:30.4773058Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *slow-gradcheck* ]]
2025-12-04T09:05:30.4773373Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:05:30.4773632Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:05:30.4773874Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:05:30.4774087Z + [[ default == *crossref* ]]
2025-12-04T09:05:30.4774304Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:05:30.4774566Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:05:30.4774840Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:05:30.4775093Z + pip_install ninja==1.10.2
2025-12-04T09:05:30.4775347Z + pip_install_pkg='python3 -m pip install --progress-bar off'
2025-12-04T09:05:30.4775691Z + python3 -m pip install --progress-bar off ninja==1.10.2
2025-12-04T09:05:31.2389876Z Collecting ninja==1.10.2
2025-12-04T09:05:31.2597517Z   Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB)
2025-12-04T09:05:31.2868142Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
2025-12-04T09:05:31.6365854Z Installing collected packages: ninja
2025-12-04T09:05:31.6366381Z   Attempting uninstall: ninja
2025-12-04T09:05:31.6373396Z     Found existing installation: ninja 1.11.1.4
2025-12-04T09:05:31.6395630Z     Uninstalling ninja-1.11.1.4:
2025-12-04T09:05:31.6516665Z       Successfully uninstalled ninja-1.11.1.4
2025-12-04T09:05:31.7130726Z Successfully installed ninja-1.10.2
2025-12-04T09:05:31.7567980Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:05:31.7569436Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:05:31.7570345Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:05:31.7570734Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *asan* ]]
2025-12-04T09:05:31.7571110Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-debug* ]]
2025-12-04T09:05:31.7571722Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:05:31.7572355Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass'
2025-12-04T09:05:31.7572983Z We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass
2025-12-04T09:05:31.7573416Z + cd test
2025-12-04T09:05:31.7573749Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)'
2025-12-04T09:05:33.1731254Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2025-12-04T09:05:33.1731728Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]]
2025-12-04T09:05:33.1732207Z + [[ default == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]]
2025-12-04T09:05:33.1736367Z + DYNAMO_BENCHMARK_FLAGS=()
2025-12-04T09:05:33.1736727Z + [[ default == *pr_time_benchmarks* ]]
2025-12-04T09:05:33.1737022Z + [[ default == *dynamo_eager* ]]
2025-12-04T09:05:33.1737272Z + [[ default == *aot_eager* ]]
2025-12-04T09:05:33.1737515Z + [[ default == *aot_inductor* ]]
2025-12-04T09:05:33.1737790Z + [[ default == *max_autotune_inductor* ]]
2025-12-04T09:05:33.1738058Z + [[ default == *inductor* ]]
2025-12-04T09:05:33.1738295Z + [[ default == *dynamic* ]]
2025-12-04T09:05:33.1738522Z + [[ default == *cpu* ]]
2025-12-04T09:05:33.1738731Z + [[ default == *xpu* ]]
2025-12-04T09:05:33.1738986Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
2025-12-04T09:05:33.1757224Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:05:33.1757636Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-bazel-* ]]
2025-12-04T09:05:33.1760543Z + cd test
2025-12-04T09:05:33.1761251Z + python -c 'import torch; print(torch.__config__.show())'
2025-12-04T09:05:34.6320231Z PyTorch built with:
2025-12-04T09:05:34.6320529Z   - GCC 11.4
2025-12-04T09:05:34.6320735Z   - C++ Version: 201703
2025-12-04T09:05:34.6321265Z   - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:05:34.6321936Z   - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:05:34.6322326Z   - OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:05:34.6322636Z   - LAPACK is enabled (usually provided by MKL)
2025-12-04T09:05:34.6322929Z   - NNPACK is enabled
2025-12-04T09:05:34.6323157Z   - CPU capability usage: AVX2
2025-12-04T09:05:34.6323407Z   - CUDA Runtime 12.8
2025-12-04T09:05:34.6323877Z   - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_89,code=sm_89
2025-12-04T09:05:34.6324364Z   - CuDNN 91.0.2  (built against CUDA 12.9)
2025-12-04T09:05:34.6328357Z   - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=35b7a9a26c5923d98aebaa41a031dae21788a9ee, CUDA_VERSION=12.8, CUDNN_VERSION=9.10.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 
2025-12-04T09:05:34.6331918Z 
2025-12-04T09:05:34.8978872Z + cd test
2025-12-04T09:05:34.8979293Z + python -c 'import torch; print(torch.__config__.parallel_info())'
2025-12-04T09:05:36.0642371Z ATen/Parallel:
2025-12-04T09:05:36.0642687Z 	at::get_num_threads() : 8
2025-12-04T09:05:36.0642972Z 	at::get_num_interop_threads() : 8
2025-12-04T09:05:36.0643261Z OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:05:36.0643536Z 	omp_get_max_threads() : 8
2025-12-04T09:05:36.0644058Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:05:36.0644629Z 	mkl_get_max_threads() : 8
2025-12-04T09:05:36.0644974Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:05:36.0645372Z std::thread::hardware_concurrency() : 16
2025-12-04T09:05:36.0645651Z Environment variables:
2025-12-04T09:05:36.0645876Z 	OMP_NUM_THREADS : [not set]
2025-12-04T09:05:36.0646132Z 	MKL_NUM_THREADS : [not set]
2025-12-04T09:05:36.0646371Z ATen parallel backend: OpenMP
2025-12-04T09:05:36.0646529Z 
2025-12-04T09:05:36.2945935Z + [[ default == *numpy_2* ]]
2025-12-04T09:05:36.2946330Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:05:36.2946679Z + [[ default == *backward* ]]
2025-12-04T09:05:36.2946977Z + [[ default == *libtorch_agnostic_targetting* ]]
2025-12-04T09:05:36.2947270Z + [[ default == *xla* ]]
2025-12-04T09:05:36.2947491Z + [[ default == *vllm* ]]
2025-12-04T09:05:36.2947725Z + [[ default == *executorch* ]]
2025-12-04T09:05:36.2947978Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]]
2025-12-04T09:05:36.2948665Z + [[ default == \q\u\a\n\t\i\z\a\t\i\o\n ]]
2025-12-04T09:05:36.2949035Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:05:36.2949365Z + [[ default == distributed ]]
2025-12-04T09:05:36.2949626Z + [[ default == *operator_benchmark* ]]
2025-12-04T09:05:36.2949915Z + [[ default == *operator_microbenchmark* ]]
2025-12-04T09:05:36.2950424Z + [[ default == *attention_microbenchmark* ]]
2025-12-04T09:05:36.2950708Z + [[ default == *inductor_distributed* ]]
2025-12-04T09:05:36.2950989Z + [[ default == *inductor-halide* ]]
2025-12-04T09:05:36.2951269Z + [[ default == *inductor-pallas* ]]
2025-12-04T09:05:36.2951578Z + [[ default == *inductor-triton-cpu* ]]
2025-12-04T09:05:36.2951880Z + [[ default == *inductor-micro-benchmark* ]]
2025-12-04T09:05:36.2952197Z + [[ default == *aoti_cross_compile_for_windows* ]]
2025-12-04T09:05:36.2952505Z + [[ default == *huggingface* ]]
2025-12-04T09:05:36.2952750Z + [[ default == *timm* ]]
2025-12-04T09:05:36.2952979Z + [[ default == cachebench ]]
2025-12-04T09:05:36.2953232Z + [[ default == verify_cachebench ]]
2025-12-04T09:05:36.2953482Z + [[ default == *torchbench* ]]
2025-12-04T09:05:36.2953740Z + [[ default == *inductor_cpp_wrapper* ]]
2025-12-04T09:05:36.2954017Z + [[ default == *inductor_core* ]]
2025-12-04T09:05:36.2954262Z + [[ default == *inductor* ]]
2025-12-04T09:05:36.2954505Z + [[ default == *einops* ]]
2025-12-04T09:05:36.2954748Z + [[ default == *dynamo_core* ]]
2025-12-04T09:05:36.2954995Z + [[ default == *dynamo_wrapped* ]]
2025-12-04T09:05:36.2955299Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:05:36.2955592Z + [[ 2 == 1 ]]
2025-12-04T09:05:36.2955767Z + [[ 2 == 2 ]]
2025-12-04T09:05:36.2955961Z + [[ 5 -gt 1 ]]
2025-12-04T09:05:36.2956161Z + install_torchvision
2025-12-04T09:05:36.2956376Z + local orig_preload
2025-12-04T09:05:36.2956583Z + local commit
2025-12-04T09:05:36.2956784Z ++ get_pinned_commit vision
2025-12-04T09:05:36.2957038Z ++ cat .github/ci_commit_pins/vision.txt
2025-12-04T09:05:36.2966837Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:05:36.2967288Z + orig_preload=
2025-12-04T09:05:36.2967490Z + '[' -n '' ']'
2025-12-04T09:05:36.2967781Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:05:36.2968069Z + export FORCE_CUDA=1
2025-12-04T09:05:36.2968328Z + FORCE_CUDA=1
2025-12-04T09:05:36.2968500Z + export WITH_CUDA=1
2025-12-04T09:05:36.2968669Z + WITH_CUDA=1
2025-12-04T09:05:36.2969083Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision
2025-12-04T09:05:36.2969723Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:05:36.2970126Z + local wheel_dir=dist/vision
2025-12-04T09:05:36.2970324Z + local found_whl=0
2025-12-04T09:05:36.2970508Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:05:36.2970716Z + [[ -f dist/vision/*.whl ]]
2025-12-04T09:05:36.2970900Z + '[' 0 == 0 ']'
2025-12-04T09:05:36.2971405Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:05:36.5813901Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:05:36.5817964Z   Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-rn60iyfs
2025-12-04T09:05:36.5841298Z   Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-rn60iyfs
2025-12-04T09:05:37.9492053Z   Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e'
2025-12-04T09:05:37.9515057Z   Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:05:38.0483368Z   Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:05:39.8509343Z   Preparing metadata (pyproject.toml) ... [?25l- \ | done
2025-12-04T09:05:39.8542901Z [?25hBuilding wheels for collected packages: torchvision
2025-12-04T09:06:51.6966651Z   Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done
2025-12-04T09:06:51.6995877Z [?25h  Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1786644 sha256=d9c504dd778bc57c39d094deb7a120c77dff6453fc9c022eea9fa4143dbf79b1
2025-12-04T09:06:51.6997072Z   Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac
2025-12-04T09:06:51.7031525Z Successfully built torchvision
2025-12-04T09:06:51.7968672Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:06:51.7969275Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:06:51.7969938Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl')
2025-12-04T09:06:51.7970356Z + local args
2025-12-04T09:06:51.7970719Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]]
2025-12-04T09:06:51.7971149Z + for path in "${args[@]}"
2025-12-04T09:06:51.7971562Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl'
2025-12-04T09:06:51.7972175Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:06:51.7972866Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:06:52.0931172Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:06:52.1014085Z Installing collected packages: torchvision
2025-12-04T09:06:52.5121697Z Successfully installed torchvision-0.25.0a0+617079d
2025-12-04T09:06:52.5395949Z + '[' -n '' ']'
2025-12-04T09:06:52.5396269Z + test_python_shard 2
2025-12-04T09:06:52.5396506Z + [[ -z 5 ]]
2025-12-04T09:06:52.5397251Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 2 5 --verbose --upload-artifacts-while-running
2025-12-04T09:06:56.8538795Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json
2025-12-04T09:06:56.8996095Z Ignoring disabled issues:  ['']
2025-12-04T09:06:56.9073536Z Found test times from artifacts
2025-12-04T09:06:56.9392189Z Found test times from artifacts
2025-12-04T09:06:56.9401453Z Running all tests
2025-12-04T09:06:56.9987745Z Running parallel tests on 1 processes
2025-12-04T09:06:56.9994724Z Name: tests to run (est. time: 273.21min)
2025-12-04T09:06:56.9995230Z   Serial tests (127):
2025-12-04T09:06:56.9995617Z     inductor/test_aot_inductor 2/4
2025-12-04T09:06:56.9996060Z     dynamo/test_repros 1/1
2025-12-04T09:06:56.9996467Z     inductor/test_flex_attention 2/6
2025-12-04T09:06:56.9996802Z     inductor/test_cuda_select_algorithm 1/1
2025-12-04T09:06:56.9997144Z     inductor/test_compile_subprocess 2/2
2025-12-04T09:06:56.9997448Z     test_decomp 1/22
2025-12-04T09:06:56.9997665Z     test_decomp 5/22
2025-12-04T09:06:56.9997868Z     test_decomp 10/22
2025-12-04T09:06:56.9998073Z     test_decomp 15/22
2025-12-04T09:06:56.9998299Z     test_decomp 20/22
2025-12-04T09:06:56.9998518Z     test_ci_sanity_check_fail 1/1
2025-12-04T09:06:56.9998761Z     test_ops 3/9
2025-12-04T09:06:56.9998953Z     test_ops 8/9
2025-12-04T09:06:56.9999196Z     inductor/test_torchinductor_dynamic_shapes 4/4
2025-12-04T09:06:56.9999509Z     inductor/test_torchinductor_opinfo 2/13
2025-12-04T09:06:56.9999753Z     inductor/test_torchinductor_opinfo 7/13
2025-12-04T09:06:57.0000090Z     inductor/test_torchinductor_opinfo 12/13
2025-12-04T09:06:57.0000324Z     inductor/test_cuda_repro 1/1
2025-12-04T09:06:57.0000538Z     inductor/test_compiled_autograd 1/2
2025-12-04T09:06:57.0000762Z     inductor/test_layout_optim 1/1
2025-12-04T09:06:57.0001269Z     dynamo/test_exc 1/1
2025-12-04T09:06:57.0001576Z     inductor/test_aot_inductor_arrayref 1/2
2025-12-04T09:06:57.0001896Z     inductor/test_halide 1/1
2025-12-04T09:06:57.0002100Z     inductor/test_deterministic 1/3
2025-12-04T09:06:57.0002315Z     dynamo/test_deque_reconstruct 1/1
2025-12-04T09:06:57.0002746Z     inductor/test_inductor_annotations 1/1
2025-12-04T09:06:57.0003128Z     inductor/test_compile_worker 1/1
2025-12-04T09:06:57.0003495Z     dynamo/test_fx_passes_pre_grad 1/1
2025-12-04T09:06:57.0003850Z     inductor/test_fp8 1/1
2025-12-04T09:06:57.0004139Z     inductor/test_flex_flash 1/1
2025-12-04T09:06:57.0004344Z     inductor/test_segmented_tree 1/1
2025-12-04T09:06:57.0004561Z     inductor/test_kernel_optimization 1/1
2025-12-04T09:06:57.0004931Z     inductor/test_metrics 1/1
2025-12-04T09:06:57.0005142Z     export/test_unflatten_training_ir 1/1
2025-12-04T09:06:57.0005357Z     inductor/test_triton_kernels 1/1
2025-12-04T09:06:57.0005565Z     inductor/test_lookup_table 1/1
2025-12-04T09:06:57.0005813Z     inductor/test_cutedsl_template 1/1
2025-12-04T09:06:57.0006112Z     inductor/test_benchmark_fusion 1/1
2025-12-04T09:06:57.0006463Z     export/test_serdes 1/1
2025-12-04T09:06:57.0006779Z     inductor/test_control_deps 1/1
2025-12-04T09:06:57.0007132Z     inductor/test_benchmarking 1/1
2025-12-04T09:06:57.0007366Z     inductor/test_helion_kernels 1/1
2025-12-04T09:06:57.0007575Z     inductor/test_quantization 1/1
2025-12-04T09:06:57.0007780Z     inductor/test_best_config 1/1
2025-12-04T09:06:57.0007973Z     export/test_tools 1/1
2025-12-04T09:06:57.0008171Z     inductor/test_compiled_optimizers 1/3
2025-12-04T09:06:57.0008410Z     inductor/test_aot_inductor_custom_ops 1/1
2025-12-04T09:06:57.0008632Z     inductor/test_control_flow 4/5
2025-12-04T09:06:57.0008837Z     dynamo/test_cudagraphs 1/1
2025-12-04T09:06:57.0009040Z     inductor/test_alignment 1/1
2025-12-04T09:06:57.0009268Z     dynamo/test_guard_serialization 1/1
2025-12-04T09:06:57.0009500Z     inductor/test_needs_exact_strides 1/1
2025-12-04T09:06:57.0009734Z     inductor/test_auto_functionalize 1/1
2025-12-04T09:06:57.0009954Z     dynamo/test_modes 1/1
2025-12-04T09:06:57.0010152Z     inductor/test_custom_partitioner_fn 1/1
2025-12-04T09:06:57.0010381Z     dynamo/test_debug_utils 1/1
2025-12-04T09:06:57.0010583Z     dynamo/test_base_hop 1/1
2025-12-04T09:06:57.0010776Z     dynamo/test_export 1/1
2025-12-04T09:06:57.0010978Z     dynamo/test_python_dispatcher 1/1
2025-12-04T09:06:57.0011205Z     export/test_swap 1/1
2025-12-04T09:06:57.0011389Z     export/test_unflatten 1/1
2025-12-04T09:06:57.0011593Z     dynamo/test_verify_correctness 1/1
2025-12-04T09:06:57.0011846Z     dynamo/test_wrap_inductor_compiled_regions 1/1
2025-12-04T09:06:57.0012110Z     dynamo/test_cudagraphs_expandable_segments 1/1
2025-12-04T09:06:57.0012346Z     inductor/test_caching 1/1
2025-12-04T09:06:57.0012542Z     dynamo/test_reorder_logs 1/1
2025-12-04T09:06:57.0012742Z     dynamo/test_subclasses 1/1
2025-12-04T09:06:57.0012931Z     dynamo/test_comptime 1/1
2025-12-04T09:06:57.0013135Z     test_privateuseone_python_backend 1/1
2025-12-04T09:06:57.0013367Z     functorch/test_rearrange 1/1
2025-12-04T09:06:57.0013563Z     functorch/test_parsing 1/1
2025-12-04T09:06:57.0013753Z     test_varlen_attention 1/1
2025-12-04T09:06:57.0013941Z     test_mkl_verbose 1/1
2025-12-04T09:06:57.0014121Z     test_cpp_api_parity 1/1
2025-12-04T09:06:57.0014304Z     test_autoload 1/1
2025-12-04T09:06:57.0014491Z     nn/attention/test_open_registry 1/1
2025-12-04T09:06:57.0014700Z     xpu/test_fusion 1/1
2025-12-04T09:06:57.0014872Z     test_foreach 1/1
2025-12-04T09:06:57.0015036Z     test_pytree 1/1
2025-12-04T09:06:57.0015204Z     test_namedtuple_return_api 1/1
2025-12-04T09:06:57.0015425Z     profiler/test_record_function 1/1
2025-12-04T09:06:57.0015655Z     test_compile_benchmark_util 1/1
2025-12-04T09:06:57.0015893Z     test_set_default_mobile_cpu_allocator 1/1
2025-12-04T09:06:57.0016128Z     test_fake_tensor 1/1
2025-12-04T09:06:57.0016308Z     test_binary_ufuncs 1/1
2025-12-04T09:06:57.0016599Z     test_meta 2/4
2025-12-04T09:06:57.0016757Z     test_fx 1/1
2025-12-04T09:06:57.0016919Z     test_ops_gradients 2/4
2025-12-04T09:06:57.0017292Z     test_nestedtensor 3/4
2025-12-04T09:06:57.0017479Z     functorch/test_control_flow 4/4
2025-12-04T09:06:57.0017705Z     complex_tensor/test_complex_tensor 3/3
2025-12-04T09:06:57.0018084Z     optim/test_optim 1/1
2025-12-04T09:06:57.0018282Z     torch_np/numpy_tests/fft/test_pocketfft 1/1
2025-12-04T09:06:57.0018515Z     functorch/test_ops 1/9
2025-12-04T09:06:57.0018699Z     functorch/test_ops 6/9
2025-12-04T09:06:57.0018910Z     torch_np/numpy_tests/core/test_getlimits 1/1
2025-12-04T09:06:57.0019155Z     torch_np/test_ndarray_methods 1/1
2025-12-04T09:06:57.0019364Z     test_view_ops 1/1
2025-12-04T09:06:57.0019534Z     test_nn 1/1
2025-12-04T09:06:57.0019729Z     torch_np/numpy_tests/lib/test_index_tricks 1/1
2025-12-04T09:06:57.0019969Z     test_jit_autocast 1/1
2025-12-04T09:06:57.0020150Z     nn/test_pooling 1/1
2025-12-04T09:06:57.0020326Z     nn/test_embedding 1/1
2025-12-04T09:06:57.0020520Z     test_xnnpack_integration 1/1
2025-12-04T09:06:57.0020724Z     test_cuda_trace 1/1
2025-12-04T09:06:57.0020889Z     test_native_mha 1/1
2025-12-04T09:06:57.0021092Z     torch_np/numpy_tests/core/test_numerictypes 1/1
2025-12-04T09:06:57.0021347Z     test_cuda_nvml_based_avail 1/1
2025-12-04T09:06:57.0021562Z     test_function_schema 1/1
2025-12-04T09:06:57.0021752Z     test_accelerator 1/1
2025-12-04T09:06:57.0021925Z     nn/test_init 1/1
2025-12-04T09:06:57.0022120Z     torch_np/numpy_tests/core/test_scalar_methods 1/1
2025-12-04T09:06:57.0022379Z     torch_np/numpy_tests/fft/test_helper 1/1
2025-12-04T09:06:57.0022606Z     test_mobile_optimizer 1/1
2025-12-04T09:06:57.0022791Z     test_overrides 1/1
2025-12-04T09:06:57.0022975Z     torch_np/test_function_base 1/1
2025-12-04T09:06:57.0023183Z     test_type_promotion 1/1
2025-12-04T09:06:57.0023373Z     torch_np/test_scalars_0D_arrays 1/1
2025-12-04T09:06:57.0023586Z     test_cuda_primary_ctx 1/1
2025-12-04T09:06:57.0023785Z     profiler/test_profiler_tree 1/1
2025-12-04T09:06:57.0024016Z     torch_np/numpy_tests/lib/test_arraysetops 1/1
2025-12-04T09:06:57.0024239Z     test_dlpack 1/1
2025-12-04T09:06:57.0024409Z     profiler/test_torch_tidy 1/1
2025-12-04T09:06:57.0024606Z     lazy/test_reuse_ir 1/1
2025-12-04T09:06:57.0024804Z     test_functional_autograd_benchmark 1/1
2025-12-04T09:06:57.0025024Z     test_reductions 1/1
2025-12-04T09:06:57.0025198Z     test_autoload_enable 1/1
2025-12-04T09:06:57.0025378Z   Parallel tests (0):
2025-12-04T09:06:57.0025572Z Name: excluded (est. time: 0.0min)
2025-12-04T09:06:57.0025778Z   Serial tests (0):
2025-12-04T09:06:57.0025936Z   Parallel tests (0):
2025-12-04T09:06:57.0026216Z Running inductor/test_aot_inductor 2/4 ... [2025-12-04 09:06:57.000103][856.928401149]
2025-12-04T09:06:57.0026552Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:06:57.0027311Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:06:57.000433]
2025-12-04T09:15:38.7959112Z 
2025-12-04T09:15:38.7960240Z inductor/test_aot_inductor 2/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_2.4_15a925ff16cb0669_.log
2025-12-04T09:15:38.8033583Z Running 227 items in this shard: test/inductor/test_aot_inductor.py::AOTInductorLoggingTest::test_shape_env_reuse, test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_no_compile_standalone, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_amp_fallback_random_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_fp8_dtype_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_sym_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_user_defined_triton_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_profiler_enable_kernel_profile_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_autotuning_args_reuse_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_3_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_cpu_predicate_cuda_operands_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_cpu_predicate_cuda_operands_max_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicate_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_parameters_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_dup_unbacked_sym_decl_with_refinement_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fill__fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_dynamic_dim_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_on_disk_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_libtorch_free_so_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misc_1_max_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_missing_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_tensor_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_none_args_aot_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_hann_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_shape_failed_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scaled_grouped_mm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_size_with_unbacked_add_expr_transitive_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_so_without_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_expr_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_sym_i64_input_codegen_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_dynamic_launcher_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_user_managed_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_nested_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_offset_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_profiler_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_grid_with_unbacked_symbols_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_zero_size_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_multiple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_name_collision_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_tensor_meta_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_bool_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_boolean_indexing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_codegen_int_array_var_fix_memory_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_disable_one_pass_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_with_parameters_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv3d_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_custom_op_in_subgraph_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_cat_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_scalar_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_embedding_bag_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fft_c2c_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_input_codegen_with_sympy_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_issue_140766_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_missing_cubin_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_mixed_device_1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_model_modified_weights_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_multi_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_contiguous_output_alias_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_tensor_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_normal_functional_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_calling_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_rocm_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_fp8_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_grouped_mm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_2_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_shifted_constraint_ranges_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sym_i64_input_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symfloat_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_autotuning_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_bool_param_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_upper_bound_i64_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_weight_on_disk_legacy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_with_profiler_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_addmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_sym_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_assert_tensor_meta_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_backward_no_op_logging_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bool_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_boolean_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_and_force_mmap_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_disable_one_pass_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_unbacked_symint_closure_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_conv3d_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_conv_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_device_moved_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_duplicate_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_cat_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_cat_dtype_promotion_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_view_of_param_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_freezing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_linear_dynamic_maxautotune_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_misc_1_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_cubin_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_model_modified_weights_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_multiple_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_poi_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_profile_benchmark_harness_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeat_interleave_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_calling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sdpa_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_from_multi_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_symint_item_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_dynamic_launcher_grid_infer_from_tensor_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_dynamic_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_extern_kernel_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_expr_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_sympy_fn_like_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_sym_expr_cond_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps
2025-12-04T09:15:38.8104638Z 
2025-12-04T09:15:38.8104878Z Finished inductor/test_aot_inductor 2/4 ... [2025-12-04 09:15:38.796056][1378.72435061], took 8.70min
2025-12-04T09:15:38.8105763Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d77224b10dd1e10b.xml
2025-12-04T09:15:39.2309808Z Uploading artifacts took 0.14 seconds
2025-12-04T09:15:39.2312567Z Running dynamo/test_repros 1/1 ... [2025-12-04 09:15:39.231027][1379.159323701]
2025-12-04T09:15:39.2312947Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:15:39.2315647Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_repros.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:15:39.231326]
2025-12-04T09:17:34.2826104Z 
2025-12-04T09:17:34.2827263Z dynamo/test_repros 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_repros_1.1_21fd2d3c0d4dd552_.log
2025-12-04T09:17:34.2895714Z Running 351 items in this shard: test/dynamo/test_repros.py::LRUCacheWarningTests::test_lru_cache_warning_issued_during_tracing, test/dynamo/test_repros.py::ReproTests::test_312_local_cell_overlap, test/dynamo/test_repros.py::ReproTests::test_Size, test/dynamo/test_repros.py::ReproTests::test_abc_setattr, test/dynamo/test_repros.py::ReproTests::test_add_complex_conj, test/dynamo/test_repros.py::ReproTests::test_add_sub_alpha_out, test/dynamo/test_repros.py::ReproTests::test_addr_alpha_beta_out, test/dynamo/test_repros.py::ReproTests::test_amp_foreach_fake_impl, test/dynamo/test_repros.py::ReproTests::test_aot_autograd_runtime_wrapper_prologue_profiled, test/dynamo/test_repros.py::ReproTests::test_as_strided_on_base_with_mutation_works, test/dynamo/test_repros.py::ReproTests::test_as_strided_on_existing_view_banned, test/dynamo/test_repros.py::ReproTests::test_attached_attribute_in_dir, test/dynamo/test_repros.py::ReproTests::test_autograd_function_graph_break, test/dynamo/test_repros.py::ReproTests::test_avoid_dupe_specialization, test/dynamo/test_repros.py::ReproTests::test_batch_encoding_clone_inputs, test/dynamo/test_repros.py::ReproTests::test_batch_norm_act, test/dynamo/test_repros.py::ReproTests::test_batchnorm_e2e, test/dynamo/test_repros.py::ReproTests::test_bigbird_unsqueeze_inplace, test/dynamo/test_repros.py::ReproTests::test_bitwise_op_guard, test/dynamo/test_repros.py::ReproTests::test_bitwise_print_precedence, test/dynamo/test_repros.py::ReproTests::test_boxes_len, test/dynamo/test_repros.py::ReproTests::test_build_map_unpack_with_call, test/dynamo/test_repros.py::ReproTests::test_c_defined_metaclass, test/dynamo/test_repros.py::ReproTests::test_cells_unsupported_step_exception, test/dynamo/test_repros.py::ReproTests::test_changing_stride, test/dynamo/test_repros.py::ReproTests::test_chunk_reformer_ff, test/dynamo/test_repros.py::ReproTests::test_class_member, test/dynamo/test_repros.py::ReproTests::test_classmethod_with_slots, test/dynamo/test_repros.py::ReproTests::test_clone_not_memory_dense, test/dynamo/test_repros.py::ReproTests::test_compilation_metrics_on_error, test/dynamo/test_repros.py::ReproTests::test_compile_complex_conj, test/dynamo/test_repros.py::ReproTests::test_compile_copy__int_overload, test/dynamo/test_repros.py::ReproTests::test_compiled_module_truthiness, test/dynamo/test_repros.py::ReproTests::test_const_dict_keyerror, test/dynamo/test_repros.py::ReproTests::test_contains_range_constprop, test/dynamo/test_repros.py::ReproTests::test_convert_boxes_to_pooler_format, test/dynamo/test_repros.py::ReproTests::test_copy_weird_strides, test/dynamo/test_repros.py::ReproTests::test_create_rand_mask_from_inputs, test/dynamo/test_repros.py::ReproTests::test_dalle2_maybe, test/dynamo/test_repros.py::ReproTests::test_data_attr_mutation_after_saved_for_bw, test/dynamo/test_repros.py::ReproTests::test_dataclass_in_module, test/dynamo/test_repros.py::ReproTests::test_dataclass_init_with_default_factory_with_inputs, test/dynamo/test_repros.py::ReproTests::test_ddp_checkpoint, test/dynamo/test_repros.py::ReproTests::test_dedup_global, test/dynamo/test_repros.py::ReproTests::test_deferred_runtime_asserts, test/dynamo/test_repros.py::ReproTests::test_delattr, test/dynamo/test_repros.py::ReproTests::test_delattr_raises, test/dynamo/test_repros.py::ReproTests::test_delattr_return, test/dynamo/test_repros.py::ReproTests::test_delete_local_error, test/dynamo/test_repros.py::ReproTests::test_deleted_compile_wrapper_segfault, test/dynamo/test_repros.py::ReproTests::test_delsubscr, test/dynamo/test_repros.py::ReproTests::test_delsubscr_raises, test/dynamo/test_repros.py::ReproTests::test_detectron2_instances_cat, test/dynamo/test_repros.py::ReproTests::test_disabling_unpack_hooks_within_compiled_region, test/dynamo/test_repros.py::ReproTests::test_distributions_subclass, test/dynamo/test_repros.py::ReproTests::test_do_paste_mask, test/dynamo/test_repros.py::ReproTests::test_dont_aggressively_write_assert, test/dynamo/test_repros.py::ReproTests::test_dont_dce_rand, test/dynamo/test_repros.py::ReproTests::test_dropout_inline, test/dynamo/test_repros.py::ReproTests::test_dynamic_shape_disable_duck_size, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_double_not_equal, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_float_guard, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_implicit_guard, test/dynamo/test_repros.py::ReproTests::test_dynamic_shapes_right_side, test/dynamo/test_repros.py::ReproTests::test_dynamo_default_lru_cache_behavior, test/dynamo/test_repros.py::ReproTests::test_dynamo_disable_lru_cache_behavior, test/dynamo/test_repros.py::ReproTests::test_dynamo_set_recursion_limit, test/dynamo/test_repros.py::ReproTests::test_dynamo_set_recursion_limit_usage, test/dynamo/test_repros.py::ReproTests::test_ellipsis, test/dynamo/test_repros.py::ReproTests::test_embedding_backward_broadcasting_decomp, test/dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_False, test/dynamo/test_repros.py::ReproTests::test_empty_graph_nested_calls_fullgraph_True, test/dynamo/test_repros.py::ReproTests::test_empty_list_contains_with_jump, test/dynamo/test_repros.py::ReproTests::test_empty_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_enum, test/dynamo/test_repros.py::ReproTests::test_ephemeral_module, test/dynamo/test_repros.py::ReproTests::test_error_return_without_exception_set, test/dynamo/test_repros.py::ReproTests::test_exception_in_dynamo_handling, test/dynamo/test_repros.py::ReproTests::test_exec_import, test/dynamo/test_repros.py::ReproTests::test_exec_wildcard_import, test/dynamo/test_repros.py::ReproTests::test_export_vs_dynamo_for_multiheadattention, test/dynamo/test_repros.py::ReproTests::test_flip_bad_accuracy, test/dynamo/test_repros.py::ReproTests::test_for_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_for_loop_graph_break_before, test/dynamo/test_repros.py::ReproTests::test_foreach_decomp_arg_names, test/dynamo/test_repros.py::ReproTests::test_fsdp_set_input_mutation_applied_when_input_gets_no_gradients, test/dynamo/test_repros.py::ReproTests::test_function_in_skipfiles, test/dynamo/test_repros.py::ReproTests::test_functools_wraps, test/dynamo/test_repros.py::ReproTests::test_gan_repro_trying_to_backward_through_the_graph_a_second_time, test/dynamo/test_repros.py::ReproTests::test_generator_dealloc, test/dynamo/test_repros.py::ReproTests::test_get_parameter_dtype, test/dynamo/test_repros.py::ReproTests::test_get_type_hints, test/dynamo/test_repros.py::ReproTests::test_global_fn_mutation, test/dynamo/test_repros.py::ReproTests::test_grad, test/dynamo/test_repros.py::ReproTests::test_grad_mode_carrying_correct_state_after_graph_break, test/dynamo/test_repros.py::ReproTests::test_grad_references_cleared, test/dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance, test/dynamo/test_repros.py::ReproTests::test_graph_break_on_jit_isinstance_pep585, test/dynamo/test_repros.py::ReproTests::test_graph_break_unsupported_fake, test/dynamo/test_repros.py::ReproTests::test_guard_default_device, test/dynamo/test_repros.py::ReproTests::test_guard_fail_nested_tuple, test/dynamo/test_repros.py::ReproTests::test_guard_fail_tensor_bool, test/dynamo/test_repros.py::ReproTests::test_guard_ordering_shape_fail, test/dynamo/test_repros.py::ReproTests::test_guard_same_frame_fail_message, test/dynamo/test_repros.py::ReproTests::test_guard_with_tuple_mutation, test/dynamo/test_repros.py::ReproTests::test_hasattr_builtin, test/dynamo/test_repros.py::ReproTests::test_hf_bigbird_unsqueeze, test/dynamo/test_repros.py::ReproTests::test_hf_classinstantier, test/dynamo/test_repros.py::ReproTests::test_hf_gelu_inline, test/dynamo/test_repros.py::ReproTests::test_hf_model_output, test/dynamo/test_repros.py::ReproTests::test_hf_t5_forward, test/dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_inference, test/dynamo/test_repros.py::ReproTests::test_hf_xsoftmax_training, test/dynamo/test_repros.py::ReproTests::test_iadd_graph_break, test/dynamo/test_repros.py::ReproTests::test_incompatible_configs, test/dynamo/test_repros.py::ReproTests::test_indexing_with_list, test/dynamo/test_repros.py::ReproTests::test_inductor_dynamic_shapes_broadcasting, test/dynamo/test_repros.py::ReproTests::test_inductor_no_recursionerror_on_for_loops, test/dynamo/test_repros.py::ReproTests::test_inductor_rng_default_dtype, test/dynamo/test_repros.py::ReproTests::test_inference_mode_dynamic_shapes, test/dynamo/test_repros.py::ReproTests::test_inlining_cornercase, test/dynamo/test_repros.py::ReproTests::test_inplace_unsqueeze_input, test/dynamo/test_repros.py::ReproTests::test_int_format, test/dynamo/test_repros.py::ReproTests::test_intermediate_leaf_requires_grad, test/dynamo/test_repros.py::ReproTests::test_invalid_seq_unpack, test/dynamo/test_repros.py::ReproTests::test_is_make_fx_tracing, test/dynamo/test_repros.py::ReproTests::test_is_symbolic_tracing, test/dynamo/test_repros.py::ReproTests::test_isinstance_dtype, test/dynamo/test_repros.py::ReproTests::test_isinstance_storage, test/dynamo/test_repros.py::ReproTests::test_issue111522, test/dynamo/test_repros.py::ReproTests::test_issue111918, test/dynamo/test_repros.py::ReproTests::test_issue114171, test/dynamo/test_repros.py::ReproTests::test_issue126128, test/dynamo/test_repros.py::ReproTests::test_issue134451, test/dynamo/test_repros.py::ReproTests::test_issue1466_size_aot_autograd, test/dynamo/test_repros.py::ReproTests::test_issue164247_backend_eager, test/dynamo/test_repros.py::ReproTests::test_issue164247_backend_inductor, test/dynamo/test_repros.py::ReproTests::test_issue175, test/dynamo/test_repros.py::ReproTests::test_jit_script_defaults, test/dynamo/test_repros.py::ReproTests::test_jit_trace_errors, test/dynamo/test_repros.py::ReproTests::test_kwargs_out_list_variable, test/dynamo/test_repros.py::ReproTests::test_list_aliasing, test/dynamo/test_repros.py::ReproTests::test_list_index, test/dynamo/test_repros.py::ReproTests::test_list_index_not_found, test/dynamo/test_repros.py::ReproTests::test_list_index_tensor_unsupported, test/dynamo/test_repros.py::ReproTests::test_list_reverse, test/dynamo/test_repros.py::ReproTests::test_list_self_reference, test/dynamo/test_repros.py::ReproTests::test_listcomp, test/dynamo/test_repros.py::ReproTests::test_longformer_chunk, test/dynamo/test_repros.py::ReproTests::test_longtensor_list, test/dynamo/test_repros.py::ReproTests::test_lru_cache_tracing, test/dynamo/test_repros.py::ReproTests::test_maml_item_capture, test/dynamo/test_repros.py::ReproTests::test_maml_no_item_capture, test/dynamo/test_repros.py::ReproTests::test_many_overlapping_inputs_does_not_explode_guards, test/dynamo/test_repros.py::ReproTests::test_many_views_with_mutation, test/dynamo/test_repros.py::ReproTests::test_map_with_multiple_args, test/dynamo/test_repros.py::ReproTests::test_maybe_multiply_symint, test/dynamo/test_repros.py::ReproTests::test_mem_leak_guards, test/dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list1, test/dynamo/test_repros.py::ReproTests::test_merge_criteria_processor_list2, test/dynamo/test_repros.py::ReproTests::test_method_overriding, test/dynamo/test_repros.py::ReproTests::test_module_in_skipfiles, test/dynamo/test_repros.py::ReproTests::test_modules, test/dynamo/test_repros.py::ReproTests::test_multi_dot_import, test/dynamo/test_repros.py::ReproTests::test_multi_import, test/dynamo/test_repros.py::ReproTests::test_named_buffers, test/dynamo/test_repros.py::ReproTests::test_nanmean_out, test/dynamo/test_repros.py::ReproTests::test_negative_floor_div_solve, test/dynamo/test_repros.py::ReproTests::test_negative_shape_guard, test/dynamo/test_repros.py::ReproTests::test_nested_while_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_nn_module_callable, test/dynamo/test_repros.py::ReproTests::test_nn_module_property_closure, test/dynamo/test_repros.py::ReproTests::test_nn_module_stack_bc, test/dynamo/test_repros.py::ReproTests::test_nn_param_freevar_codegen, test/dynamo/test_repros.py::ReproTests::test_nn_parameter, test/dynamo/test_repros.py::ReproTests::test_nn_parameter_ctor_graph_breaks, test/dynamo/test_repros.py::ReproTests::test_nn_parametrize, test/dynamo/test_repros.py::ReproTests::test_no_grad_inline, test/dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame, test/dynamo/test_repros.py::ReproTests::test_no_tracing_into_eval_frame_ctx_manager, test/dynamo/test_repros.py::ReproTests::test_nonconst_issubclass, test/dynamo/test_repros.py::ReproTests::test_not_rewrite_assert_for_other_errors, test/dynamo/test_repros.py::ReproTests::test_nullcontext1, test/dynamo/test_repros.py::ReproTests::test_nullcontext2, test/dynamo/test_repros.py::ReproTests::test_numpy_not_ndarray_recompiles, test/dynamo/test_repros.py::ReproTests::test_numpy_tobytes_no_error, test/dynamo/test_repros.py::ReproTests::test_odict_get_item_index_name, test/dynamo/test_repros.py::ReproTests::test_omegaconf_dictconfig, test/dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_contains, test/dynamo/test_repros.py::ReproTests::test_omegaconf_listconfig_iter, test/dynamo/test_repros.py::ReproTests::test_ones_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_optim_state_references_cleared, test/dynamo/test_repros.py::ReproTests::test_optimized_deepcopy, test/dynamo/test_repros.py::ReproTests::test_optimized_module_patched_init, test/dynamo/test_repros.py::ReproTests::test_optimized_module_training, test/dynamo/test_repros.py::ReproTests::test_os_fspath, test/dynamo/test_repros.py::ReproTests::test_out_nested_cell_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_nested_cell_tuple_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_none, test/dynamo/test_repros.py::ReproTests::test_out_overload_non_contiguous, test/dynamo/test_repros.py::ReproTests::test_out_root_cell_shape_change, test/dynamo/test_repros.py::ReproTests::test_out_root_cell_tuple_shape_change, test/dynamo/test_repros.py::ReproTests::test_output_aliases_intermediate, test/dynamo/test_repros.py::ReproTests::test_overlapping_inputs_with_dynamic_shapes_error, test/dynamo/test_repros.py::ReproTests::test_overwriting_params, test/dynamo/test_repros.py::ReproTests::test_partially_initialized_module_property, test/dynamo/test_repros.py::ReproTests::test_partitioner_activation_memory_budget_with_unbacked_symints, test/dynamo/test_repros.py::ReproTests::test_partitioner_cse_respects_mutation_boundaries, test/dynamo/test_repros.py::ReproTests::test_pointless_graph_removal, test/dynamo/test_repros.py::ReproTests::test_preserve_stride_with_clone, test/dynamo/test_repros.py::ReproTests::test_primtorch, test/dynamo/test_repros.py::ReproTests::test_primtorch_no_graph_break, test/dynamo/test_repros.py::ReproTests::test_randint_out_dynamic, test/dynamo/test_repros.py::ReproTests::test_recursive_map, test/dynamo/test_repros.py::ReproTests::test_reformer_eval, test/dynamo/test_repros.py::ReproTests::test_reformer_min_chunk_len, test/dynamo/test_repros.py::ReproTests::test_reformer_sorting, test/dynamo/test_repros.py::ReproTests::test_reformer_train, test/dynamo/test_repros.py::ReproTests::test_reinplacing, test/dynamo/test_repros.py::ReproTests::test_relative_import, test/dynamo/test_repros.py::ReproTests::test_relative_import_no_modulename, test/dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode1, test/dynamo/test_repros.py::ReproTests::test_requires_grad_guards_with_grad_mode2, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass1, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass2, test/dynamo/test_repros.py::ReproTests::test_restricted_list_subclass3, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_mixed_grad, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_scalar, test/dynamo/test_repros.py::ReproTests::test_return_value_duplication_tensor, test/dynamo/test_repros.py::ReproTests::test_return_weakref, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_dont_change_bytecode, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_noop, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_msg, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_with_non_string_msg, test/dynamo/test_repros.py::ReproTests::test_rewrite_assert_without_msg, test/dynamo/test_repros.py::ReproTests::test_rng_state, test/dynamo/test_repros.py::ReproTests::test_seq_append_list, test/dynamo/test_repros.py::ReproTests::test_setattr_requires_grad_graph_breaks, test/dynamo/test_repros.py::ReproTests::test_setitem_boolean_mask_diff, test/dynamo/test_repros.py::ReproTests::test_setitem_tensor_prop, test/dynamo/test_repros.py::ReproTests::test_setitem_tuple_boolean_mask_diff, test/dynamo/test_repros.py::ReproTests::test_sigmoid_out, test/dynamo/test_repros.py::ReproTests::test_sigmoid_out2, test/dynamo/test_repros.py::ReproTests::test_size_typematch, test/dynamo/test_repros.py::ReproTests::test_slice_into_list_mutable, test/dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape, test/dynamo/test_repros.py::ReproTests::test_slicing_dynamic_shape_setitem, test/dynamo/test_repros.py::ReproTests::test_sort_out, test/dynamo/test_repros.py::ReproTests::test_sort_out2, test/dynamo/test_repros.py::ReproTests::test_specialized_stride, test/dynamo/test_repros.py::ReproTests::test_split_with_sizes_aot_autograd, test/dynamo/test_repros.py::ReproTests::test_staticmethod_allow_in_graph, test/dynamo/test_repros.py::ReproTests::test_stk_sdd_is_transposed, test/dynamo/test_repros.py::ReproTests::test_stop_iteration_reconstruct, test/dynamo/test_repros.py::ReproTests::test_str_isalnum, test/dynamo/test_repros.py::ReproTests::test_string_format, test/dynamo/test_repros.py::ReproTests::test_subclass_graph_output_repro, test/dynamo/test_repros.py::ReproTests::test_super_classmethod, test/dynamo/test_repros.py::ReproTests::test_super_classmethod_inheritance, test/dynamo/test_repros.py::ReproTests::test_super_diamond, test/dynamo/test_repros.py::ReproTests::test_super_in_staticmethod, test/dynamo/test_repros.py::ReproTests::test_super_staticmethod, test/dynamo/test_repros.py::ReproTests::test_swin_base_tensor_attr, test/dynamo/test_repros.py::ReproTests::test_symint_bitwise, test/dynamo/test_repros.py::ReproTests::test_symnode_is_not_op, test/dynamo/test_repros.py::ReproTests::test_symnode_is_op, test/dynamo/test_repros.py::ReproTests::test_sys_monitoring, test/dynamo/test_repros.py::ReproTests::test_tensor_data_kwarg, test/dynamo/test_repros.py::ReproTests::test_tensor_isinstance_tuple, test/dynamo/test_repros.py::ReproTests::test_tensor_item, test/dynamo/test_repros.py::ReproTests::test_tensor_random, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_aot_eager_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_eager_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func1, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func2, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_backend_inductor_func_name_func3, test/dynamo/test_repros.py::ReproTests::test_tensor_set_data_mismatched_dtype, test/dynamo/test_repros.py::ReproTests::test_tensor_split, test/dynamo/test_repros.py::ReproTests::test_tensor_split_within_device_cm, test/dynamo/test_repros.py::ReproTests::test_tensor_uniform, test/dynamo/test_repros.py::ReproTests::test_threading_local, test/dynamo/test_repros.py::ReproTests::test_tokenization, test/dynamo/test_repros.py::ReproTests::test_torch_compile_in_compile_frame, test/dynamo/test_repros.py::ReproTests::test_torch_ops_aten, test/dynamo/test_repros.py::ReproTests::test_torch_tensor_ops, test/dynamo/test_repros.py::ReproTests::test_torch_tensor_ops_no_graph_break, test/dynamo/test_repros.py::ReproTests::test_torch_variable_type, test/dynamo/test_repros.py::ReproTests::test_torchname, test/dynamo/test_repros.py::ReproTests::test_trace_functional_tensor_with, test/dynamo/test_repros.py::ReproTests::test_tuple_enum_as_key_dict, test/dynamo/test_repros.py::ReproTests::test_typed_dict, test/dynamo/test_repros.py::ReproTests::test_typed_dict_total, test/dynamo/test_repros.py::ReproTests::test_udf_classes_reconstruction, test/dynamo/test_repros.py::ReproTests::test_unbacked_arange_in_bounds, test/dynamo/test_repros.py::ReproTests::test_unbind_copy_out, test/dynamo/test_repros.py::ReproTests::test_unpack_hooks_can_be_disabled, test/dynamo/test_repros.py::ReproTests::test_unpack_hooks_dont_run_during_tracing, test/dynamo/test_repros.py::ReproTests::test_unspecialized_nn_module_with_torch_variable_attribute, test/dynamo/test_repros.py::ReproTests::test_unsqueeze_mul_strides, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init, test/dynamo/test_repros.py::ReproTests::test_user_ctor_ctx_manager_custom_init_graph_break, test/dynamo/test_repros.py::ReproTests::test_user_defined_iter, test/dynamo/test_repros.py::ReproTests::test_user_defined_object_callable, test/dynamo/test_repros.py::ReproTests::test_validate_model_kwargs, test/dynamo/test_repros.py::ReproTests::test_vc_bumped_in_inference_graph, test/dynamo/test_repros.py::ReproTests::test_vdd_duplicate_error, test/dynamo/test_repros.py::ReproTests::test_view_dtype_overload, test/dynamo/test_repros.py::ReproTests::test_weakref, test/dynamo/test_repros.py::ReproTests::test_weakref_callback, test/dynamo/test_repros.py::ReproTests::test_weakref_construction, test/dynamo/test_repros.py::ReproTests::test_weakref_del, test/dynamo/test_repros.py::ReproTests::test_weakref_proxy, test/dynamo/test_repros.py::ReproTests::test_weakref_reconstruct, test/dynamo/test_repros.py::ReproTests::test_while_loop_graph_break, test/dynamo/test_repros.py::ReproTests::test_while_loop_graph_break_inside_call_function, test/dynamo/test_repros.py::ReproTests::test_with_on_graph_break_inst, test/dynamo/test_repros.py::ReproTests::test_with_on_graph_break_nested, test/dynamo/test_repros.py::ReproTests::test_zeros_out_dynamic, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_cuda_sync_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_current_accelerator_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_data_dependent_error_log_no_print_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_deepcopy_constant_tensor_in_aot_bwd_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_safe_grad_warning_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_user_warnings_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_filter_warnings_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_flash_attn_backward_mixed_strides_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_getattr_return_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_guard_default_device_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_megablocks_moe_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_memleak_when_graph_input_has_tensor_attr_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_module_attribute_error_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_named_tuple_vt_clone_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_norm_dtype_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_partial_export_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_partitioner_saves_weights_for_bw_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_get_node_type_not_traced_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_get_node_type_with_namedtuple_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_tree_is_leaf_not_traced_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_pytree_tree_is_leaf_with_namedtuple_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sdpa_dynamic_shapes_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_sub_alpha_scalar_repro_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_tensor_size_hasattr_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_torch_cuda_is_initialized_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_truthiness_of_symints_no_recompiles_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_udf_class_source_cuda, test/dynamo/test_repros.py::ReproTestsDeviceCUDA::test_zero_dim_param_mixed_device_grad_cuda
2025-12-04T09:17:34.2964079Z 
2025-12-04T09:17:34.2964304Z Finished dynamo/test_repros 1/1 ... [2025-12-04 09:17:34.282856][1494.211150867], took 1.92min
2025-12-04T09:17:34.2965014Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-87366e2d7057b5b0.xml
2025-12-04T09:17:34.4077811Z Running inductor/test_flex_attention 2/6 ... [2025-12-04 09:17:34.407532][1494.335830862]
2025-12-04T09:17:34.4078294Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:17:34.4081135Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_attention.py', '--shard-id=2', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:17:34.407854]
2025-12-04T09:26:36.5449821Z 
2025-12-04T09:26:36.5450726Z inductor/test_flex_attention 2/6 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_attention_2.6_90be3f66c016358d_.log
2025-12-04T09:26:36.5500255Z Running 135 items in this shard: test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_causal_mask_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_GQA_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_backend_defaults_and_rejects_invalid_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_backend_rejects_legacy_force_use_flag_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_block_mask_non_divisible_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_automatic_dynamic_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE3_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod0_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod2_BLOCK_SIZE_256_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE3_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod3_BLOCK_SIZE_128_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod4_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod5_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod6_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_128_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_256_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_different_block_size_score_mod7_BLOCK_SIZE_256_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_dynamic_score_mask_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod2_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod6_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_score_mod7_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_default_sparse_block_size_score_mod6_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_causal_block_non_divisible_with_captured_buffer_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_cpu_error_message_return_lse_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_document_masking_edge_case_mode_aot_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_document_masking_edge_case_mode_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_dynamic_shapes_bug_dynamic_batch_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_epilogue_fused_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order0_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order1_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order3_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_backward_stride_ordering_mode_inductor_permute_order3_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_inductor_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order0_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order2_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order4_shape0_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_flex_attention_stride_ordering_mode_paged_attention_permute_order4_shape1_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_fully_masked_out_rows_0_check_compile_True_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_function_composition_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kernel_options_argument_is_respected_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims0_head_dims1_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod5_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims1_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_batch_dims2_head_dims1_score_mod2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims0_head_dims0_score_mod4_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims1_head_dims1_score_mod3_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_kv_batch_broadcast_causal_mask_batch_dims2_head_dims0_score_mod7_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_logsumexp_correctness_score_mod1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_lse_masked_output_backend_eager_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_modular_indexing_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_njt_causal_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod1_head_dims1_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod3_head_dims1_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod6_head_dims0_cuda_float32, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_equal_head_dims_score_mod7_head_dims0_cuda_bfloat16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_non_pow_2_headdim_head_dim_24_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_padded_dense_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_aux__alibi_bias_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_return_max__rel_causal_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_silu_on_score_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s0_v_s0_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s2_v_s2_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s0_k_s3_v_s3_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s0_v_s0_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s1_v_s1_do_s2_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_strided_inputs_q_s1_k_s2_v_s2_do_s0_cuda_float16, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_subgraph_respect_decompostion_cuda, test/inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_triton_template_warp_specialization_cuda, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod0_cuda_float16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod1_cuda_bfloat16, test/inductor/test_flex_attention.py::TestPagedAttentionCUDA::test_paged_builtin_score_mods_score_mod5_cuda_float32, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_attributes_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_operations_with_none_q_indices_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_block_mask_vs_sequence_lengths_compile_True_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_compiling_create_block_mask_no_recompile_cuda, test/inductor/test_flex_attention.py::TestBlockMaskCUDA::test_pytree_preserves_new_attributes_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_absolute_2d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_backprop_error_case_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_batch_head_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_distinct_biases_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_flipped_indexed_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_global_tokens_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_head_specific_gate_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_learnable_bias_global_compiled_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_local_window_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_multiplicative_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_relative_1d_bias_only_grad_batch:2_head:4_seq_len:37_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:bfloat16_mode_max-autotune-no-cudagraphs_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_symmetric_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:bfloat16_mode_default_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float16_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:256_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:277_headdim:16_dtype:float32_cuda, test/inductor/test_flex_attention.py::TestLearnableBiasesCUDA::test_weird_bias_batch:2_head:4_seq_len:37_headdim:16_dtype:bfloat16_cuda
2025-12-04T09:26:36.5549097Z 
2025-12-04T09:26:36.5549336Z Finished inductor/test_flex_attention 2/6 ... [2025-12-04 09:26:36.544472][2036.472760582], took 9.04min
2025-12-04T09:26:36.5550155Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-f32aa134ae4d7a45.xml
2025-12-04T09:26:36.6280935Z Running inductor/test_cuda_select_algorithm 1/1 ... [2025-12-04 09:26:36.627786][2036.556084059]
2025-12-04T09:26:36.6281455Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:26:36.6284038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_select_algorithm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:26:36.628107]
2025-12-04T10:11:57.3476574Z 
2025-12-04T10:11:57.3477472Z PRINTING LOG FILE of inductor/test_cuda_select_algorithm 1/1 (test/test-reports/inductor.test_cuda_select_algorithm_1.1_c5144f504c6801ae_.log)
2025-12-04T10:11:57.3478549Z W1204 09:26:41.492000 34150 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.3479461Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml
2025-12-04T10:11:57.3480321Z ============================= test session starts ==============================
2025-12-04T10:11:57.3480754Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.3481337Z cachedir: .pytest_cache
2025-12-04T10:11:57.3481819Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.3482465Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.3482786Z configfile: pytest.ini
2025-12-04T10:11:57.3483406Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.3484056Z collecting ... collected 58 items
2025-12-04T10:11:57.3484325Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:11:57.3528019Z Running 58 items in this shard: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.3569888Z 
2025-12-04T10:11:57.3570806Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0067s] [  1%]
2025-12-04T10:11:57.3572730Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5930s] [  1%]
2025-12-04T10:11:57.3574724Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5913s] [  1%]
2025-12-04T10:11:57.3575407Z 
2025-12-04T10:11:57.3575520Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.3576419Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3577248Z Traceback (most recent call last):
2025-12-04T10:11:57.3578032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3578776Z     method(*args, **kwargs)
2025-12-04T10:11:57.3579475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3580228Z     method(*args, **kwargs)
2025-12-04T10:11:57.3580933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3581685Z     with policy():
2025-12-04T10:11:57.3582377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3583132Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3584752Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.3586292Z 
2025-12-04T10:11:57.3586521Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3587711Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3588775Z 
2025-12-04T10:11:57.3589054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3589686Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3590221Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3591100Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3592041Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3592484Z graph_break []
2025-12-04T10:11:57.3593195Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3593973Z Traceback (most recent call last):
2025-12-04T10:11:57.3594741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3595484Z     method(*args, **kwargs)
2025-12-04T10:11:57.3596191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3596930Z     method(*args, **kwargs)
2025-12-04T10:11:57.3597665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3598436Z     with policy():
2025-12-04T10:11:57.3599117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3599963Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3601712Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.3603264Z 
2025-12-04T10:11:57.3603501Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3604835Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3606162Z 
2025-12-04T10:11:57.3606437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3607126Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3607688Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3608669Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3609698Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3610158Z graph_break []
2025-12-04T10:11:57.3610571Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3611063Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3611496Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3612462Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3613293Z graph_break []
2025-12-04T10:11:57.3613594Z =================================== FAILURES ===================================
2025-12-04T10:11:57.3614426Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3615209Z Traceback (most recent call last):
2025-12-04T10:11:57.3615980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3616686Z     method(*args, **kwargs)
2025-12-04T10:11:57.3617560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3618332Z     method(*args, **kwargs)
2025-12-04T10:11:57.3618866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3619478Z     with policy():
2025-12-04T10:11:57.3620154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3620910Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3622411Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.3623972Z 
2025-12-04T10:11:57.3624217Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3625551Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3626607Z 
2025-12-04T10:11:57.3626905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3627596Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3628187Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3628973Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3630012Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3630526Z graph_break []
2025-12-04T10:11:57.3630866Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3631389Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3632070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3632865Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3633433Z graph_break []
2025-12-04T10:11:57.3633849Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3634326Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3634810Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3635707Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3636442Z graph_break []
2025-12-04T10:11:57.3637372Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml -
2025-12-04T10:11:57.3638081Z =========================== short test summary info ============================
2025-12-04T10:11:57.3640174Z FAILED [0.5913s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.3642598Z 
2025-12-04T10:11:57.3642831Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3644124Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3645216Z 
2025-12-04T10:11:57.3645521Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3646150Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.3646644Z ========================== 1 failed, 2 rerun in 3.22s ==========================
2025-12-04T10:11:57.3647057Z Got exit code 1
2025-12-04T10:11:57.3647337Z Retrying single test...
2025-12-04T10:11:57.3647933Z W1204 09:26:51.254000 34332 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.3649232Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml
2025-12-04T10:11:57.3650137Z ============================= test session starts ==============================
2025-12-04T10:11:57.3650817Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.3651410Z cachedir: .pytest_cache
2025-12-04T10:11:57.3652138Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.3652974Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.3653351Z configfile: pytest.ini
2025-12-04T10:11:57.3654054Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.3655004Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.3656384Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3657584Z Running 1 items in this shard
2025-12-04T10:11:57.3657823Z 
2025-12-04T10:11:57.3659314Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:26:52.365432592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3660850Z 
2025-12-04T10:11:57.3661320Z [W1204 09:27:01.454677478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3661971Z 
2025-12-04T10:11:57.3662533Z [W1204 09:27:01.454996623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3663207Z 
2025-12-04T10:11:57.3663639Z [W1204 09:27:01.455565703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3664306Z 
2025-12-04T10:11:57.3664691Z [W1204 09:27:01.455757606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3665332Z 
2025-12-04T10:11:57.3665851Z [W1204 09:27:01.456962397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3666445Z 
2025-12-04T10:11:57.3666777Z [W1204 09:27:01.457115099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3667384Z 
2025-12-04T10:11:57.3667928Z [W1204 09:27:01.457394864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3668605Z 
2025-12-04T10:11:57.3669166Z [W1204 09:27:01.457569317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3669770Z 
2025-12-04T10:11:57.3670301Z [W1204 09:27:01.466029442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3670933Z 
2025-12-04T10:11:57.3671278Z [W1204 09:27:01.466230516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3671992Z 
2025-12-04T10:11:57.3672426Z [W1204 09:27:01.466403159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3673078Z 
2025-12-04T10:11:57.3673623Z [W1204 09:27:01.466631803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3674263Z 
2025-12-04T10:11:57.3674794Z [W1204 09:27:01.466777735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3675475Z 
2025-12-04T10:11:57.3675908Z [W1204 09:27:01.467027499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3676538Z 
2025-12-04T10:11:57.3677079Z [W1204 09:27:01.467163982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3677762Z 
2025-12-04T10:11:57.3678245Z [W1204 09:27:01.467394526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3678915Z 
2025-12-04T10:11:57.3679377Z [W1204 09:27:01.467531328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3680138Z 
2025-12-04T10:11:57.3680819Z [W1204 09:27:01.554995499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3681537Z 
2025-12-04T10:11:57.3682025Z [W1204 09:27:01.555211613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3682768Z 
2025-12-04T10:11:57.3683159Z [W1204 09:27:01.555362955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3683755Z 
2025-12-04T10:11:57.3684280Z [W1204 09:27:01.555567449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3684968Z 
2025-12-04T10:11:57.3685491Z [W1204 09:27:01.555694131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3686202Z 
2025-12-04T10:11:57.3686762Z [W1204 09:27:01.555908245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3687448Z 
2025-12-04T10:11:57.3688003Z [W1204 09:27:01.556031197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3688675Z 
2025-12-04T10:11:57.3689189Z [W1204 09:27:01.556237141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3689859Z 
2025-12-04T10:11:57.3690372Z [W1204 09:27:01.556366093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3691079Z 
2025-12-04T10:11:57.3691237Z ('RERUN', {'yellow': True}) [11.1055s] [100%]
2025-12-04T10:11:57.3692856Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:02.785652369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3694375Z 
2025-12-04T10:11:57.3694934Z [W1204 09:27:02.785904843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3695592Z 
2025-12-04T10:11:57.3696087Z [W1204 09:27:02.786058606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3696637Z 
2025-12-04T10:11:57.3697146Z [W1204 09:27:02.786267639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3697799Z 
2025-12-04T10:11:57.3698311Z [W1204 09:27:02.786397221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3698967Z 
2025-12-04T10:11:57.3699409Z [W1204 09:27:02.786614455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3700113Z 
2025-12-04T10:11:57.3700644Z [W1204 09:27:02.786736497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3701198Z 
2025-12-04T10:11:57.3701761Z [W1204 09:27:02.786938841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3702256Z 
2025-12-04T10:11:57.3702772Z [W1204 09:27:02.787065343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3703415Z 
2025-12-04T10:11:57.3703956Z [W1204 09:27:02.793157197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3704660Z 
2025-12-04T10:11:57.3705354Z [W1204 09:27:02.793335531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3706037Z 
2025-12-04T10:11:57.3706589Z [W1204 09:27:02.793490313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3707214Z 
2025-12-04T10:11:57.3707754Z [W1204 09:27:02.793696437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3708597Z 
2025-12-04T10:11:57.3709135Z [W1204 09:27:02.793823239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3709815Z 
2025-12-04T10:11:57.3710228Z [W1204 09:27:02.794059723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3710815Z 
2025-12-04T10:11:57.3711342Z [W1204 09:27:02.794195405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3712056Z 
2025-12-04T10:11:57.3712619Z [W1204 09:27:02.794403459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3713321Z 
2025-12-04T10:11:57.3713809Z [W1204 09:27:02.794525251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3714462Z 
2025-12-04T10:11:57.3715011Z [W1204 09:27:02.876171272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3715554Z 
2025-12-04T10:11:57.3716115Z [W1204 09:27:02.876396526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3716778Z 
2025-12-04T10:11:57.3717479Z [W1204 09:27:02.876548479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3718202Z 
2025-12-04T10:11:57.3718748Z [W1204 09:27:02.876754152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3719417Z 
2025-12-04T10:11:57.3719787Z [W1204 09:27:02.876879174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3720495Z 
2025-12-04T10:11:57.3721023Z [W1204 09:27:02.877092258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3721664Z 
2025-12-04T10:11:57.3722224Z [W1204 09:27:02.877214230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3722907Z 
2025-12-04T10:11:57.3723448Z [W1204 09:27:02.877415363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3724125Z 
2025-12-04T10:11:57.3724660Z [W1204 09:27:02.877535425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3725310Z 
2025-12-04T10:11:57.3725463Z ('RERUN', {'yellow': True}) [0.5560s] [100%]
2025-12-04T10:11:57.3727004Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:03.339418552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3728393Z 
2025-12-04T10:11:57.3728942Z [W1204 09:27:03.339633386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3729634Z 
2025-12-04T10:11:57.3730191Z [W1204 09:27:03.339783629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3730742Z 
2025-12-04T10:11:57.3731497Z [W1204 09:27:03.339995102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3732207Z 
2025-12-04T10:11:57.3732757Z [W1204 09:27:03.340143785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3733491Z 
2025-12-04T10:11:57.3734045Z [W1204 09:27:03.340381999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3734697Z 
2025-12-04T10:11:57.3735020Z [W1204 09:27:03.340513921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3735706Z 
2025-12-04T10:11:57.3736198Z [W1204 09:27:03.340720045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3736811Z 
2025-12-04T10:11:57.3737358Z [W1204 09:27:03.340844117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3738030Z 
2025-12-04T10:11:57.3738565Z [W1204 09:27:03.346733597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3739266Z 
2025-12-04T10:11:57.3739817Z [W1204 09:27:03.346902610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3740438Z 
2025-12-04T10:11:57.3740982Z [W1204 09:27:03.347053573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3741638Z 
2025-12-04T10:11:57.3742154Z [W1204 09:27:03.347256656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3742841Z 
2025-12-04T10:11:57.3743375Z [W1204 09:27:03.347382588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3744050Z 
2025-12-04T10:11:57.3744569Z [W1204 09:27:03.347598712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3745265Z 
2025-12-04T10:11:57.3745799Z [W1204 09:27:03.347727074 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3746461Z 
2025-12-04T10:11:57.3746932Z [W1204 09:27:03.347931418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3747576Z 
2025-12-04T10:11:57.3748102Z [W1204 09:27:03.348057610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3748726Z 
2025-12-04T10:11:57.3749278Z [W1204 09:27:03.429108151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3749981Z 
2025-12-04T10:11:57.3750510Z [W1204 09:27:03.429291695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3751184Z 
2025-12-04T10:11:57.3751731Z [W1204 09:27:03.429441157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3752401Z 
2025-12-04T10:11:57.3752915Z [W1204 09:27:03.429645651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3753559Z 
2025-12-04T10:11:57.3754055Z [W1204 09:27:03.429770363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3754697Z 
2025-12-04T10:11:57.3755342Z [W1204 09:27:03.429984577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3759553Z 
2025-12-04T10:11:57.3760182Z [W1204 09:27:03.430126819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3760844Z 
2025-12-04T10:11:57.3761502Z [W1204 09:27:03.430348153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3798108Z 
2025-12-04T10:11:57.3798693Z [W1204 09:27:03.430469925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3799340Z 
2025-12-04T10:11:57.3799446Z FAILED [0.5523s] [100%]
2025-12-04T10:11:57.3799626Z 
2025-12-04T10:11:57.3799773Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.3800658Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3801429Z Traceback (most recent call last):
2025-12-04T10:11:57.3802145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3802860Z     method(*args, **kwargs)
2025-12-04T10:11:57.3803529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3804213Z     method(*args, **kwargs)
2025-12-04T10:11:57.3804862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3805534Z     with policy():
2025-12-04T10:11:57.3806192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3806883Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3808403Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.3809786Z 
2025-12-04T10:11:57.3809995Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3811168Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3812112Z 
2025-12-04T10:11:57.3812378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3812982Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3813495Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3814399Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3815321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3815775Z graph_break []
2025-12-04T10:11:57.3816156Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.3817779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.3819108Z   if out == self.unknown_value:
2025-12-04T10:11:57.3819583Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3820384Z Traceback (most recent call last):
2025-12-04T10:11:57.3821292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3822122Z     method(*args, **kwargs)
2025-12-04T10:11:57.3822920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3823880Z     method(*args, **kwargs)
2025-12-04T10:11:57.3824585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3825337Z     with policy():
2025-12-04T10:11:57.3825817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3826639Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3828322Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.3829437Z 
2025-12-04T10:11:57.3829579Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3830328Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3830946Z 
2025-12-04T10:11:57.3831111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3831488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3831802Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3832338Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3832914Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3833190Z graph_break []
2025-12-04T10:11:57.3833410Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.3834344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.3835201Z   if out == self.unknown_value:
2025-12-04T10:11:57.3835456Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3835761Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3836059Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3836620Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3837116Z graph_break []
2025-12-04T10:11:57.3837292Z =================================== FAILURES ===================================
2025-12-04T10:11:57.3837775Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3838242Z Traceback (most recent call last):
2025-12-04T10:11:57.3838691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3839138Z     method(*args, **kwargs)
2025-12-04T10:11:57.3839553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3840107Z     method(*args, **kwargs)
2025-12-04T10:11:57.3840518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3840960Z     with policy():
2025-12-04T10:11:57.3841464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3841905Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3842870Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.3843858Z 
2025-12-04T10:11:57.3844088Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3844935Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3845547Z 
2025-12-04T10:11:57.3845718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3846091Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3846414Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3846940Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3847497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3847765Z graph_break []
2025-12-04T10:11:57.3847991Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.3848912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.3849751Z   if out == self.unknown_value:
2025-12-04T10:11:57.3850004Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3850304Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3850592Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3851150Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3851631Z graph_break []
2025-12-04T10:11:57.3851842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3852135Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3852425Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3852969Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3853454Z graph_break []
2025-12-04T10:11:57.3854039Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml -
2025-12-04T10:11:57.3854704Z =========================== short test summary info ============================
2025-12-04T10:11:57.3856219Z FAILED [0.5523s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.3857600Z 
2025-12-04T10:11:57.3857822Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3858566Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3859242Z 
2025-12-04T10:11:57.3859403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3859748Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.3860052Z ================== 1 failed, 57 deselected, 2 rerun in 12.24s ==================
2025-12-04T10:11:57.3860305Z Got exit code 1
2025-12-04T10:11:57.3860461Z Retrying single test...
2025-12-04T10:11:57.3860835Z W1204 09:27:10.082000 34519 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.3861572Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml
2025-12-04T10:11:57.3862130Z ============================= test session starts ==============================
2025-12-04T10:11:57.3862516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.3862869Z cachedir: .pytest_cache
2025-12-04T10:11:57.3863283Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.3863735Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.3863942Z configfile: pytest.ini
2025-12-04T10:11:57.3864368Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.3864888Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.3865685Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3866415Z Running 1 items in this shard
2025-12-04T10:11:57.3866545Z 
2025-12-04T10:11:57.3867299Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:11.193227255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3868118Z 
2025-12-04T10:11:57.3868424Z [W1204 09:27:20.420740292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3868800Z 
2025-12-04T10:11:57.3869092Z [W1204 09:27:20.421006256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3869465Z 
2025-12-04T10:11:57.3869755Z [W1204 09:27:20.421586837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3870126Z 
2025-12-04T10:11:57.3870426Z [W1204 09:27:20.421775430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3870801Z 
2025-12-04T10:11:57.3871089Z [W1204 09:27:20.423026321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3871457Z 
2025-12-04T10:11:57.3871749Z [W1204 09:27:20.423192094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3872116Z 
2025-12-04T10:11:57.3872408Z [W1204 09:27:20.423477369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3872773Z 
2025-12-04T10:11:57.3873137Z [W1204 09:27:20.423658082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3873509Z 
2025-12-04T10:11:57.3873797Z [W1204 09:27:20.432140088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3874166Z 
2025-12-04T10:11:57.3874543Z [W1204 09:27:20.432354782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3874909Z 
2025-12-04T10:11:57.3875215Z [W1204 09:27:20.432534105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3875579Z 
2025-12-04T10:11:57.3875869Z [W1204 09:27:20.432772829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3876234Z 
2025-12-04T10:11:57.3876527Z [W1204 09:27:20.432924802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3876895Z 
2025-12-04T10:11:57.3877182Z [W1204 09:27:20.433172366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3877549Z 
2025-12-04T10:11:57.3877834Z [W1204 09:27:20.433312759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3878203Z 
2025-12-04T10:11:57.3878494Z [W1204 09:27:20.433544363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3878859Z 
2025-12-04T10:11:57.3879147Z [W1204 09:27:20.433683205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3879513Z 
2025-12-04T10:11:57.3879800Z [W1204 09:27:20.522317756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3880278Z 
2025-12-04T10:11:57.3880577Z [W1204 09:27:20.522539470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3880947Z 
2025-12-04T10:11:57.3881237Z [W1204 09:27:20.522693582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3881609Z 
2025-12-04T10:11:57.3881899Z [W1204 09:27:20.522905526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3882267Z 
2025-12-04T10:11:57.3882554Z [W1204 09:27:20.523031618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3882930Z 
2025-12-04T10:11:57.3883221Z [W1204 09:27:20.523248462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3883595Z 
2025-12-04T10:11:57.3883886Z [W1204 09:27:20.523373044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3884252Z 
2025-12-04T10:11:57.3884547Z [W1204 09:27:20.523583258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3884920Z 
2025-12-04T10:11:57.3885214Z [W1204 09:27:20.523704070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3885578Z 
2025-12-04T10:11:57.3885662Z ('RERUN', {'yellow': True}) [11.2481s] [100%]
2025-12-04T10:11:57.3886576Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:21.758525591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3887391Z 
2025-12-04T10:11:57.3887760Z [W1204 09:27:21.758776755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3888139Z 
2025-12-04T10:11:57.3888429Z [W1204 09:27:21.758925718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3888932Z 
2025-12-04T10:11:57.3889224Z [W1204 09:27:21.759141312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3889591Z 
2025-12-04T10:11:57.3889878Z [W1204 09:27:21.759267554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3890247Z 
2025-12-04T10:11:57.3890535Z [W1204 09:27:21.759485057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3890904Z 
2025-12-04T10:11:57.3891196Z [W1204 09:27:21.759612620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3891564Z 
2025-12-04T10:11:57.3891856Z [W1204 09:27:21.759819493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3892223Z 
2025-12-04T10:11:57.3892519Z [W1204 09:27:21.759947325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3892887Z 
2025-12-04T10:11:57.3893194Z [W1204 09:27:21.766013180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3893572Z 
2025-12-04T10:11:57.3893864Z [W1204 09:27:21.766190793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3894240Z 
2025-12-04T10:11:57.3894532Z [W1204 09:27:21.766339315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3894898Z 
2025-12-04T10:11:57.3895193Z [W1204 09:27:21.766539979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3895561Z 
2025-12-04T10:11:57.3895856Z [W1204 09:27:21.766670551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3896226Z 
2025-12-04T10:11:57.3896514Z [W1204 09:27:21.766884535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3896885Z 
2025-12-04T10:11:57.3897175Z [W1204 09:27:21.767005147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3897551Z 
2025-12-04T10:11:57.3897842Z [W1204 09:27:21.767208430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3898220Z 
2025-12-04T10:11:57.3898518Z [W1204 09:27:21.767327962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3898888Z 
2025-12-04T10:11:57.3899184Z [W1204 09:27:21.851271173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3899551Z 
2025-12-04T10:11:57.3899840Z [W1204 09:27:21.851492876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3900214Z 
2025-12-04T10:11:57.3900504Z [W1204 09:27:21.851644199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3900877Z 
2025-12-04T10:11:57.3901240Z [W1204 09:27:21.851849223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3901617Z 
2025-12-04T10:11:57.3901914Z [W1204 09:27:21.851974795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3902353Z 
2025-12-04T10:11:57.3902650Z [W1204 09:27:21.852190648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3903018Z 
2025-12-04T10:11:57.3903313Z [W1204 09:27:21.852325791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3903680Z 
2025-12-04T10:11:57.3903983Z [W1204 09:27:21.852536264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3904358Z 
2025-12-04T10:11:57.3904653Z [W1204 09:27:21.852660006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3905026Z 
2025-12-04T10:11:57.3905109Z ('RERUN', {'yellow': True}) [0.5634s] [100%]
2025-12-04T10:11:57.3906005Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:22.320709798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3906823Z 
2025-12-04T10:11:57.3907124Z [W1204 09:27:22.320927482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3907493Z 
2025-12-04T10:11:57.3907783Z [W1204 09:27:22.321077785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3908159Z 
2025-12-04T10:11:57.3908451Z [W1204 09:27:22.321285018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3908826Z 
2025-12-04T10:11:57.3909116Z [W1204 09:27:22.321410630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3909484Z 
2025-12-04T10:11:57.3909782Z [W1204 09:27:22.321630174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3910149Z 
2025-12-04T10:11:57.3910446Z [W1204 09:27:22.321755207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3910813Z 
2025-12-04T10:11:57.3911105Z [W1204 09:27:22.321962870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3911482Z 
2025-12-04T10:11:57.3911772Z [W1204 09:27:22.322093732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3912149Z 
2025-12-04T10:11:57.3912441Z [W1204 09:27:22.328172587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3912808Z 
2025-12-04T10:11:57.3913107Z [W1204 09:27:22.328354230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3913485Z 
2025-12-04T10:11:57.3913789Z [W1204 09:27:22.328511113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3914158Z 
2025-12-04T10:11:57.3914450Z [W1204 09:27:22.328738527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3914826Z 
2025-12-04T10:11:57.3915117Z [W1204 09:27:22.328863869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3915489Z 
2025-12-04T10:11:57.3915868Z [W1204 09:27:22.329092213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3916240Z 
2025-12-04T10:11:57.3916538Z [W1204 09:27:22.329218635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3916973Z 
2025-12-04T10:11:57.3917579Z [W1204 09:27:22.329425059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3917967Z 
2025-12-04T10:11:57.3918261Z [W1204 09:27:22.329548061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3918639Z 
2025-12-04T10:11:57.3918928Z [W1204 09:27:22.413363729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3919314Z 
2025-12-04T10:11:57.3919609Z [W1204 09:27:22.413550912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3920053Z 
2025-12-04T10:11:57.3920346Z [W1204 09:27:22.413699245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3920718Z 
2025-12-04T10:11:57.3921013Z [W1204 09:27:22.413903948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3921382Z 
2025-12-04T10:11:57.3921669Z [W1204 09:27:22.414033670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3922042Z 
2025-12-04T10:11:57.3922329Z [W1204 09:27:22.414246694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3922702Z 
2025-12-04T10:11:57.3922996Z [W1204 09:27:22.414373556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3923367Z 
2025-12-04T10:11:57.3923657Z [W1204 09:27:22.414579140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3924031Z 
2025-12-04T10:11:57.3924324Z [W1204 09:27:22.414701352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.3924692Z 
2025-12-04T10:11:57.3924764Z FAILED [0.5630s] [100%]
2025-12-04T10:11:57.3924875Z 
2025-12-04T10:11:57.3924968Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.3925456Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3925935Z Traceback (most recent call last):
2025-12-04T10:11:57.3926402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3926854Z     method(*args, **kwargs)
2025-12-04T10:11:57.3927279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3927725Z     method(*args, **kwargs)
2025-12-04T10:11:57.3928132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3928577Z     with policy():
2025-12-04T10:11:57.3928983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3929425Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3930518Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.3931415Z 
2025-12-04T10:11:57.3931549Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3932312Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3933030Z 
2025-12-04T10:11:57.3933200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3933572Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3933890Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3934426Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3934996Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3935267Z graph_break []
2025-12-04T10:11:57.3935492Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.3936415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.3937267Z   if out == self.unknown_value:
2025-12-04T10:11:57.3937718Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3938182Z Traceback (most recent call last):
2025-12-04T10:11:57.3938636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3939075Z     method(*args, **kwargs)
2025-12-04T10:11:57.3939515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3939959Z     method(*args, **kwargs)
2025-12-04T10:11:57.3940379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3940816Z     with policy():
2025-12-04T10:11:57.3941220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3941681Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3942632Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.3943541Z 
2025-12-04T10:11:57.3943674Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3944433Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3945056Z 
2025-12-04T10:11:57.3945218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3945593Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3945899Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3946430Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3946987Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3947255Z graph_break []
2025-12-04T10:11:57.3947579Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.3948675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.3949769Z   if out == self.unknown_value:
2025-12-04T10:11:57.3950023Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3950321Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3950618Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3951177Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3951653Z graph_break []
2025-12-04T10:11:57.3951835Z =================================== FAILURES ===================================
2025-12-04T10:11:57.3952327Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3952793Z Traceback (most recent call last):
2025-12-04T10:11:57.3953244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3953695Z     method(*args, **kwargs)
2025-12-04T10:11:57.3954112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3954548Z     method(*args, **kwargs)
2025-12-04T10:11:57.3954965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3955399Z     with policy():
2025-12-04T10:11:57.3955802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3956241Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3957190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.3958110Z 
2025-12-04T10:11:57.3958241Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3958993Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3959603Z 
2025-12-04T10:11:57.3959769Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3960213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3960523Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3961050Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3961599Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3961868Z graph_break []
2025-12-04T10:11:57.3962090Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.3963003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.3963839Z   if out == self.unknown_value:
2025-12-04T10:11:57.3964098Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3964480Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3964784Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3965339Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3965912Z graph_break []
2025-12-04T10:11:57.3966136Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3966433Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3966730Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3967286Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3967770Z graph_break []
2025-12-04T10:11:57.3968352Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml -
2025-12-04T10:11:57.3969021Z =========================== short test summary info ============================
2025-12-04T10:11:57.3970542Z FAILED [0.5630s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.3971927Z 
2025-12-04T10:11:57.3972055Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3972806Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3973415Z 
2025-12-04T10:11:57.3973577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3973923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.3974229Z ================== 1 failed, 57 deselected, 2 rerun in 12.40s ==================
2025-12-04T10:11:57.3974490Z Got exit code 1
2025-12-04T10:11:57.3975071Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3975876Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.3976463Z W1204 09:27:29.093000 34706 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.3977184Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml
2025-12-04T10:11:57.3977738Z ============================= test session starts ==============================
2025-12-04T10:11:57.3978141Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.3978510Z cachedir: .pytest_cache
2025-12-04T10:11:57.3978928Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.3979383Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.3979596Z configfile: pytest.ini
2025-12-04T10:11:57.3980025Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.3980616Z collecting ... collected 58 items / 1 deselected / 57 selected
2025-12-04T10:11:57.3980910Z stepcurrent: skipping 1 already run items.
2025-12-04T10:11:57.3981138Z Running 57 items in this shard
2025-12-04T10:11:57.3981261Z 
2025-12-04T10:11:57.3981782Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9997s] [  1%]
2025-12-04T10:11:57.3982936Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6045s] [  1%]
2025-12-04T10:11:57.3983978Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5944s] [  1%]
2025-12-04T10:11:57.3984520Z 
2025-12-04T10:11:57.3984608Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.3985090Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3985546Z Traceback (most recent call last):
2025-12-04T10:11:57.3986002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3986454Z     method(*args, **kwargs)
2025-12-04T10:11:57.3986876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3987311Z     method(*args, **kwargs)
2025-12-04T10:11:57.3987725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3988168Z     with policy():
2025-12-04T10:11:57.3988569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3989010Z     raise RuntimeError(msg)
2025-12-04T10:11:57.3989957Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.3990855Z 
2025-12-04T10:11:57.3990996Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.3991748Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.3992355Z 
2025-12-04T10:11:57.3992515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.3992885Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.3993197Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.3993728Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.3994290Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.3994555Z graph_break []
2025-12-04T10:11:57.3994957Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.3995413Z Traceback (most recent call last):
2025-12-04T10:11:57.3995860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3996295Z     method(*args, **kwargs)
2025-12-04T10:11:57.3996706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.3997234Z     method(*args, **kwargs)
2025-12-04T10:11:57.3997644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.3998074Z     with policy():
2025-12-04T10:11:57.3998467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.3998975Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4000005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4000904Z 
2025-12-04T10:11:57.4001035Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4001776Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4002382Z 
2025-12-04T10:11:57.4002539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4002906Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4003208Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4003735Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4004281Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4004545Z graph_break []
2025-12-04T10:11:57.4004757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4005058Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4005347Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4005902Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4006385Z graph_break []
2025-12-04T10:11:57.4006552Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4007029Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4007484Z Traceback (most recent call last):
2025-12-04T10:11:57.4007935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4008376Z     method(*args, **kwargs)
2025-12-04T10:11:57.4008798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4009244Z     method(*args, **kwargs)
2025-12-04T10:11:57.4009647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4010080Z     with policy():
2025-12-04T10:11:57.4010475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4010917Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4011866Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4012771Z 
2025-12-04T10:11:57.4012899Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4013714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4014329Z 
2025-12-04T10:11:57.4014494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4014922Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4015217Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4015740Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4016290Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4016550Z graph_break []
2025-12-04T10:11:57.4016772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4017299Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4017604Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4018157Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4018647Z graph_break []
2025-12-04T10:11:57.4018865Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4019163Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4019456Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4019998Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4020480Z graph_break []
2025-12-04T10:11:57.4021064Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml -
2025-12-04T10:11:57.4021727Z =========================== short test summary info ============================
2025-12-04T10:11:57.4023231Z FAILED [0.5944s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4024614Z 
2025-12-04T10:11:57.4024744Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4025488Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4026097Z 
2025-12-04T10:11:57.4026259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4026597Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4026917Z =================== 1 failed, 1 deselected, 2 rerun in 3.22s ===================
2025-12-04T10:11:57.4027167Z Got exit code 1
2025-12-04T10:11:57.4027320Z Retrying single test...
2025-12-04T10:11:57.4027699Z W1204 09:27:38.843000 34888 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4028423Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml
2025-12-04T10:11:57.4028985Z ============================= test session starts ==============================
2025-12-04T10:11:57.4029486Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4029839Z cachedir: .pytest_cache
2025-12-04T10:11:57.4030252Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4030803Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4031012Z configfile: pytest.ini
2025-12-04T10:11:57.4031443Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4031962Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4032751Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4033475Z Running 1 items in this shard
2025-12-04T10:11:57.4033604Z 
2025-12-04T10:11:57.4034473Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:40.949417502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4035292Z 
2025-12-04T10:11:57.4035610Z [W1204 09:27:49.069140543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4035988Z 
2025-12-04T10:11:57.4036290Z [W1204 09:27:49.069422077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4036656Z 
2025-12-04T10:11:57.4036946Z [W1204 09:27:49.069988306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4037321Z 
2025-12-04T10:11:57.4037656Z [W1204 09:27:49.070224220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4038096Z 
2025-12-04T10:11:57.4038438Z [W1204 09:27:49.071513610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4038933Z 
2025-12-04T10:11:57.4039283Z [W1204 09:27:49.071711683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4039719Z 
2025-12-04T10:11:57.4040064Z [W1204 09:27:49.072040268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4040436Z 
2025-12-04T10:11:57.4040726Z [W1204 09:27:49.072230831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4041099Z 
2025-12-04T10:11:57.4041390Z [W1204 09:27:49.080743223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4041762Z 
2025-12-04T10:11:57.4042054Z [W1204 09:27:49.080965406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4042424Z 
2025-12-04T10:11:57.4042720Z [W1204 09:27:49.081135199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4043086Z 
2025-12-04T10:11:57.4043379Z [W1204 09:27:49.081368253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4043745Z 
2025-12-04T10:11:57.4044037Z [W1204 09:27:49.081505155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4044409Z 
2025-12-04T10:11:57.4044777Z [W1204 09:27:49.081753389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4045151Z 
2025-12-04T10:11:57.4045439Z [W1204 09:27:49.081906311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4045810Z 
2025-12-04T10:11:57.4046188Z [W1204 09:27:49.082142505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4046556Z 
2025-12-04T10:11:57.4046849Z [W1204 09:27:49.082280397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4047221Z 
2025-12-04T10:11:57.4047508Z [W1204 09:27:49.170709500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4047879Z 
2025-12-04T10:11:57.4048170Z [W1204 09:27:49.170927173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4048543Z 
2025-12-04T10:11:57.4048831Z [W1204 09:27:49.171078615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4049204Z 
2025-12-04T10:11:57.4049491Z [W1204 09:27:49.171290079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4049859Z 
2025-12-04T10:11:57.4050152Z [W1204 09:27:49.171413541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4050520Z 
2025-12-04T10:11:57.4050809Z [W1204 09:27:49.171630904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4051187Z 
2025-12-04T10:11:57.4051479Z [W1204 09:27:49.171758116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4051854Z 
2025-12-04T10:11:57.4052148Z [W1204 09:27:49.171961039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4052518Z 
2025-12-04T10:11:57.4052807Z [W1204 09:27:49.172081811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4053179Z 
2025-12-04T10:11:57.4053268Z ('RERUN', {'yellow': True}) [11.1332s] [100%]
2025-12-04T10:11:57.4054204Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:50.402986024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4055018Z 
2025-12-04T10:11:57.4055311Z [W1204 09:27:50.403245848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4055687Z 
2025-12-04T10:11:57.4055977Z [W1204 09:27:50.403400511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4056348Z 
2025-12-04T10:11:57.4056637Z [W1204 09:27:50.403614084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4057005Z 
2025-12-04T10:11:57.4057299Z [W1204 09:27:50.403737906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4057674Z 
2025-12-04T10:11:57.4057977Z [W1204 09:27:50.403958079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4058346Z 
2025-12-04T10:11:57.4058635Z [W1204 09:27:50.404083711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4059005Z 
2025-12-04T10:11:57.4059366Z [W1204 09:27:50.404287934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4059741Z 
2025-12-04T10:11:57.4060029Z [W1204 09:27:50.404420296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4060462Z 
2025-12-04T10:11:57.4060757Z [W1204 09:27:50.410670873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4061126Z 
2025-12-04T10:11:57.4061419Z [W1204 09:27:50.410844486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4061796Z 
2025-12-04T10:11:57.4062088Z [W1204 09:27:50.410994738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4062457Z 
2025-12-04T10:11:57.4062750Z [W1204 09:27:50.411199401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4063121Z 
2025-12-04T10:11:57.4063411Z [W1204 09:27:50.411329203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4063781Z 
2025-12-04T10:11:57.4064077Z [W1204 09:27:50.411546347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4064443Z 
2025-12-04T10:11:57.4064736Z [W1204 09:27:50.411676109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4065107Z 
2025-12-04T10:11:57.4065397Z [W1204 09:27:50.411878402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4065768Z 
2025-12-04T10:11:57.4066064Z [W1204 09:27:50.412003374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4066444Z 
2025-12-04T10:11:57.4066736Z [W1204 09:27:50.496256071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4067111Z 
2025-12-04T10:11:57.4067403Z [W1204 09:27:50.496491995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4067770Z 
2025-12-04T10:11:57.4068074Z [W1204 09:27:50.496643717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4068447Z 
2025-12-04T10:11:57.4068742Z [W1204 09:27:50.496850880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4069112Z 
2025-12-04T10:11:57.4069405Z [W1204 09:27:50.496978342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4069776Z 
2025-12-04T10:11:57.4070070Z [W1204 09:27:50.497195946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4070449Z 
2025-12-04T10:11:57.4070738Z [W1204 09:27:50.497319357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4071106Z 
2025-12-04T10:11:57.4071401Z [W1204 09:27:50.497521811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4071764Z 
2025-12-04T10:11:57.4072058Z [W1204 09:27:50.497641202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4072425Z 
2025-12-04T10:11:57.4072504Z ('RERUN', {'yellow': True}) [0.5637s] [100%]
2025-12-04T10:11:57.4073468Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:51.966659812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4074360Z 
2025-12-04T10:11:57.4074658Z [W1204 09:27:51.966875585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4075025Z 
2025-12-04T10:11:57.4075323Z [W1204 09:27:51.967030348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4075690Z 
2025-12-04T10:11:57.4075985Z [W1204 09:27:51.967240971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4076352Z 
2025-12-04T10:11:57.4076642Z [W1204 09:27:51.967366443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4077014Z 
2025-12-04T10:11:57.4077302Z [W1204 09:27:51.967583856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4077681Z 
2025-12-04T10:11:57.4077969Z [W1204 09:27:51.967706648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4078334Z 
2025-12-04T10:11:57.4078631Z [W1204 09:27:51.967912961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4078997Z 
2025-12-04T10:11:57.4079286Z [W1204 09:27:51.968033353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4079653Z 
2025-12-04T10:11:57.4080011Z [W1204 09:27:51.974189459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4080383Z 
2025-12-04T10:11:57.4080672Z [W1204 09:27:51.974365261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4081044Z 
2025-12-04T10:11:57.4081337Z [W1204 09:27:51.974516304 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4081707Z 
2025-12-04T10:11:57.4081997Z [W1204 09:27:51.974718647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4082363Z 
2025-12-04T10:11:57.4082656Z [W1204 09:27:51.974841449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4083024Z 
2025-12-04T10:11:57.4083314Z [W1204 09:27:51.975055812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4083687Z 
2025-12-04T10:11:57.4083980Z [W1204 09:27:51.975178864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4084350Z 
2025-12-04T10:11:57.4084637Z [W1204 09:27:51.975395848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4085012Z 
2025-12-04T10:11:57.4085304Z [W1204 09:27:51.975515579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4085672Z 
2025-12-04T10:11:57.4085963Z [W1204 09:27:51.059682155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4086332Z 
2025-12-04T10:11:57.4086627Z [W1204 09:27:51.059867718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4086991Z 
2025-12-04T10:11:57.4087359Z [W1204 09:27:51.060038190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4087742Z 
2025-12-04T10:11:57.4088034Z [W1204 09:27:51.060254184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4088480Z 
2025-12-04T10:11:57.4088768Z [W1204 09:27:51.060391086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4089138Z 
2025-12-04T10:11:57.4089433Z [W1204 09:27:51.060606109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4089799Z 
2025-12-04T10:11:57.4090092Z [W1204 09:27:51.060728991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4090461Z 
2025-12-04T10:11:57.4090754Z [W1204 09:27:51.060927724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4091126Z 
2025-12-04T10:11:57.4091412Z [W1204 09:27:51.061046146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4091784Z 
2025-12-04T10:11:57.4091845Z FAILED [0.5588s] [100%]
2025-12-04T10:11:57.4091950Z 
2025-12-04T10:11:57.4092041Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4092516Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4092977Z Traceback (most recent call last):
2025-12-04T10:11:57.4093436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4093884Z     method(*args, **kwargs)
2025-12-04T10:11:57.4094297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4094734Z     method(*args, **kwargs)
2025-12-04T10:11:57.4095138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4095568Z     with policy():
2025-12-04T10:11:57.4095967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4096412Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4097362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4098252Z 
2025-12-04T10:11:57.4098387Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4099133Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4099753Z 
2025-12-04T10:11:57.4099915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4100293Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4100603Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4101130Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4101682Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4101947Z graph_break []
2025-12-04T10:11:57.4102244Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4103161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4104081Z   if out == self.unknown_value:
2025-12-04T10:11:57.4104513Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4104970Z Traceback (most recent call last):
2025-12-04T10:11:57.4105415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4105858Z     method(*args, **kwargs)
2025-12-04T10:11:57.4106268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4106702Z     method(*args, **kwargs)
2025-12-04T10:11:57.4107106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4107534Z     with policy():
2025-12-04T10:11:57.4107931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4108377Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4109319Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4110214Z 
2025-12-04T10:11:57.4110349Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4111107Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4111719Z 
2025-12-04T10:11:57.4111877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4112241Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4112543Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4113067Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4113617Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4113882Z graph_break []
2025-12-04T10:11:57.4114099Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4114998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4115839Z   if out == self.unknown_value:
2025-12-04T10:11:57.4116094Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4116392Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4116680Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4117557Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4118050Z graph_break []
2025-12-04T10:11:57.4118229Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4118833Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4119297Z Traceback (most recent call last):
2025-12-04T10:11:57.4119747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4120332Z     method(*args, **kwargs)
2025-12-04T10:11:57.4120745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4121184Z     method(*args, **kwargs)
2025-12-04T10:11:57.4121593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4122024Z     with policy():
2025-12-04T10:11:57.4122417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4122862Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4123808Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4124710Z 
2025-12-04T10:11:57.4124838Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4125607Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4126232Z 
2025-12-04T10:11:57.4126391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4126760Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4127066Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4127605Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4128158Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4128435Z graph_break []
2025-12-04T10:11:57.4128661Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4129562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4130398Z   if out == self.unknown_value:
2025-12-04T10:11:57.4130648Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4130946Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4131241Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4131796Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4132274Z graph_break []
2025-12-04T10:11:57.4132494Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4132794Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4133083Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4133625Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4134122Z graph_break []
2025-12-04T10:11:57.4134712Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml -
2025-12-04T10:11:57.4135526Z =========================== short test summary info ============================
2025-12-04T10:11:57.4137035Z FAILED [0.5588s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4138487Z 
2025-12-04T10:11:57.4138613Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4139353Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4139967Z 
2025-12-04T10:11:57.4140124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4140472Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4140781Z ================== 1 failed, 57 deselected, 2 rerun in 12.28s ==================
2025-12-04T10:11:57.4141048Z Got exit code 1
2025-12-04T10:11:57.4141207Z Retrying single test...
2025-12-04T10:11:57.4141578Z W1204 09:27:57.709000 35075 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4142302Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml
2025-12-04T10:11:57.4142864Z ============================= test session starts ==============================
2025-12-04T10:11:57.4143251Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4143601Z cachedir: .pytest_cache
2025-12-04T10:11:57.4144020Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4144475Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4144692Z configfile: pytest.ini
2025-12-04T10:11:57.4145113Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4145633Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4146425Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4147151Z Running 1 items in this shard
2025-12-04T10:11:57.4147280Z 
2025-12-04T10:11:57.4148034Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:27:58.825054296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4148855Z 
2025-12-04T10:11:57.4149161Z [W1204 09:28:08.011242279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4149551Z 
2025-12-04T10:11:57.4149846Z [W1204 09:28:08.011502094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4150218Z 
2025-12-04T10:11:57.4150515Z [W1204 09:28:08.012093414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4150881Z 
2025-12-04T10:11:57.4151173Z [W1204 09:28:08.012282787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4151615Z 
2025-12-04T10:11:57.4151906Z [W1204 09:28:08.013556118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4152279Z 
2025-12-04T10:11:57.4152570Z [W1204 09:28:08.013750791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4153006Z 
2025-12-04T10:11:57.4153299Z [W1204 09:28:08.014014376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4153665Z 
2025-12-04T10:11:57.4153965Z [W1204 09:28:08.014172918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4154341Z 
2025-12-04T10:11:57.4154638Z [W1204 09:28:08.022570768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4155012Z 
2025-12-04T10:11:57.4155303Z [W1204 09:28:08.022789742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4155678Z 
2025-12-04T10:11:57.4155966Z [W1204 09:28:08.022968945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4156341Z 
2025-12-04T10:11:57.4156633Z [W1204 09:28:08.023203039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4157001Z 
2025-12-04T10:11:57.4157296Z [W1204 09:28:08.023351291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4157661Z 
2025-12-04T10:11:57.4157956Z [W1204 09:28:08.023593635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4158329Z 
2025-12-04T10:11:57.4158623Z [W1204 09:28:08.023737138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4158995Z 
2025-12-04T10:11:57.4159287Z [W1204 09:28:08.023973122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4159660Z 
2025-12-04T10:11:57.4159989Z [W1204 09:28:08.024116094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4160362Z 
2025-12-04T10:11:57.4160651Z [W1204 09:28:08.111461288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4161017Z 
2025-12-04T10:11:57.4161310Z [W1204 09:28:08.111675372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4161677Z 
2025-12-04T10:11:57.4161976Z [W1204 09:28:08.111831764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4162345Z 
2025-12-04T10:11:57.4162632Z [W1204 09:28:08.112041768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4163010Z 
2025-12-04T10:11:57.4163304Z [W1204 09:28:08.112165320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4163678Z 
2025-12-04T10:11:57.4163969Z [W1204 09:28:08.112388564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4164336Z 
2025-12-04T10:11:57.4164648Z [W1204 09:28:08.112513096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4165013Z 
2025-12-04T10:11:57.4165440Z [W1204 09:28:08.112716989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4165810Z 
2025-12-04T10:11:57.4166101Z [W1204 09:28:08.112839481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4166539Z 
2025-12-04T10:11:57.4166632Z ('RERUN', {'yellow': True}) [11.2058s] [100%]
2025-12-04T10:11:57.4167530Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:09.344623118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4168340Z 
2025-12-04T10:11:57.4168637Z [W1204 09:28:09.344883732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4169006Z 
2025-12-04T10:11:57.4169308Z [W1204 09:28:09.345034504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4169679Z 
2025-12-04T10:11:57.4169969Z [W1204 09:28:09.345244428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4170347Z 
2025-12-04T10:11:57.4170635Z [W1204 09:28:09.345372390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4171003Z 
2025-12-04T10:11:57.4171289Z [W1204 09:28:09.345588794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4171657Z 
2025-12-04T10:11:57.4171954Z [W1204 09:28:09.345717676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4172319Z 
2025-12-04T10:11:57.4172619Z [W1204 09:28:09.345921659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4172986Z 
2025-12-04T10:11:57.4173276Z [W1204 09:28:09.346043381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4173653Z 
2025-12-04T10:11:57.4173944Z [W1204 09:28:09.352252994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4174324Z 
2025-12-04T10:11:57.4174616Z [W1204 09:28:09.352442427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4174986Z 
2025-12-04T10:11:57.4175283Z [W1204 09:28:09.352594560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4175651Z 
2025-12-04T10:11:57.4175947Z [W1204 09:28:09.352796913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4176314Z 
2025-12-04T10:11:57.4176604Z [W1204 09:28:09.352920245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4176982Z 
2025-12-04T10:11:57.4177280Z [W1204 09:28:09.353134279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4177652Z 
2025-12-04T10:11:57.4177940Z [W1204 09:28:09.353258461 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4178311Z 
2025-12-04T10:11:57.4178601Z [W1204 09:28:09.353459474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4178970Z 
2025-12-04T10:11:57.4179269Z [W1204 09:28:09.353586586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4179715Z 
2025-12-04T10:11:57.4180010Z [W1204 09:28:09.437295741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4180378Z 
2025-12-04T10:11:57.4180669Z [W1204 09:28:09.437515364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4181899Z 
2025-12-04T10:11:57.4182191Z [W1204 09:28:09.437666677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4182568Z 
2025-12-04T10:11:57.4182859Z [W1204 09:28:09.437873880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4183226Z 
2025-12-04T10:11:57.4183522Z [W1204 09:28:09.437995302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4183890Z 
2025-12-04T10:11:57.4184188Z [W1204 09:28:09.438207606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4184555Z 
2025-12-04T10:11:57.4184850Z [W1204 09:28:09.438329368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4185227Z 
2025-12-04T10:11:57.4185521Z [W1204 09:28:09.438529831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4185893Z 
2025-12-04T10:11:57.4186185Z [W1204 09:28:09.438651383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4186551Z 
2025-12-04T10:11:57.4186634Z ('RERUN', {'yellow': True}) [0.5623s] [100%]
2025-12-04T10:11:57.4187528Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:09.904736796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4188341Z 
2025-12-04T10:11:57.4188631Z [W1204 09:28:09.904961440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4189009Z 
2025-12-04T10:11:57.4189299Z [W1204 09:28:09.905113692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4189670Z 
2025-12-04T10:11:57.4189959Z [W1204 09:28:09.905325546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4190325Z 
2025-12-04T10:11:57.4190621Z [W1204 09:28:09.905449328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4190990Z 
2025-12-04T10:11:57.4191287Z [W1204 09:28:09.905666231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4191654Z 
2025-12-04T10:11:57.4191943Z [W1204 09:28:09.905789224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4192334Z 
2025-12-04T10:11:57.4192626Z [W1204 09:28:09.905991427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4192996Z 
2025-12-04T10:11:57.4193284Z [W1204 09:28:09.906112669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4193650Z 
2025-12-04T10:11:57.4193943Z [W1204 09:28:09.912176880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4194313Z 
2025-12-04T10:11:57.4194683Z [W1204 09:28:09.912361173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4195050Z 
2025-12-04T10:11:57.4195344Z [W1204 09:28:09.912512046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4195777Z 
2025-12-04T10:11:57.4196068Z [W1204 09:28:09.912721359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4196441Z 
2025-12-04T10:11:57.4196731Z [W1204 09:28:09.912850481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4197102Z 
2025-12-04T10:11:57.4197390Z [W1204 09:28:09.913063935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4197754Z 
2025-12-04T10:11:57.4198049Z [W1204 09:28:09.913190427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4198413Z 
2025-12-04T10:11:57.4198709Z [W1204 09:28:09.913394640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4199082Z 
2025-12-04T10:11:57.4199375Z [W1204 09:28:09.913516633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4199746Z 
2025-12-04T10:11:57.4200092Z [W1204 09:28:10.996962642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4200467Z 
2025-12-04T10:11:57.4200757Z [W1204 09:28:10.997148606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4201123Z 
2025-12-04T10:11:57.4201422Z [W1204 09:28:10.997298138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4201789Z 
2025-12-04T10:11:57.4202085Z [W1204 09:28:10.997502841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4202455Z 
2025-12-04T10:11:57.4202746Z [W1204 09:28:10.997625823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4203116Z 
2025-12-04T10:11:57.4203404Z [W1204 09:28:10.997841037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4203776Z 
2025-12-04T10:11:57.4204064Z [W1204 09:28:10.997968099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4204429Z 
2025-12-04T10:11:57.4204730Z [W1204 09:28:10.998169532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4205099Z 
2025-12-04T10:11:57.4205393Z [W1204 09:28:10.998290774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4205761Z 
2025-12-04T10:11:57.4205827Z FAILED [0.5607s] [100%]
2025-12-04T10:11:57.4205936Z 
2025-12-04T10:11:57.4206022Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4206498Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4206959Z Traceback (most recent call last):
2025-12-04T10:11:57.4207409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4207856Z     method(*args, **kwargs)
2025-12-04T10:11:57.4208350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4208791Z     method(*args, **kwargs)
2025-12-04T10:11:57.4209200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4209636Z     with policy():
2025-12-04T10:11:57.4210104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4257937Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4259000Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4259912Z 
2025-12-04T10:11:57.4260055Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4260829Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4261455Z 
2025-12-04T10:11:57.4261618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4262014Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4262327Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4262879Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4263436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4263704Z graph_break []
2025-12-04T10:11:57.4263929Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4264874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4265726Z   if out == self.unknown_value:
2025-12-04T10:11:57.4266178Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4266640Z Traceback (most recent call last):
2025-12-04T10:11:57.4267091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4267535Z     method(*args, **kwargs)
2025-12-04T10:11:57.4267955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4268388Z     method(*args, **kwargs)
2025-12-04T10:11:57.4268790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4269225Z     with policy():
2025-12-04T10:11:57.4269618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4270062Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4271014Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4271922Z 
2025-12-04T10:11:57.4272059Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4272976Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4273597Z 
2025-12-04T10:11:57.4273765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4274141Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4274561Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4275096Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4275653Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4275919Z graph_break []
2025-12-04T10:11:57.4276139Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4277059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4277914Z   if out == self.unknown_value:
2025-12-04T10:11:57.4278171Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4278482Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4278781Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4279345Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4279832Z graph_break []
2025-12-04T10:11:57.4280053Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4280541Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4281006Z Traceback (most recent call last):
2025-12-04T10:11:57.4281463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4281918Z     method(*args, **kwargs)
2025-12-04T10:11:57.4282333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4282783Z     method(*args, **kwargs)
2025-12-04T10:11:57.4283187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4283617Z     with policy():
2025-12-04T10:11:57.4284010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4284453Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4285400Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4286306Z 
2025-12-04T10:11:57.4286434Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4287188Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4287809Z 
2025-12-04T10:11:57.4287971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4288337Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4288640Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4289238Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4289790Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4290054Z graph_break []
2025-12-04T10:11:57.4290265Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4291244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4292076Z   if out == self.unknown_value:
2025-12-04T10:11:57.4292333Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4292639Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4292933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4293495Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4293985Z graph_break []
2025-12-04T10:11:57.4294201Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4294515Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4294813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4295355Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4295835Z graph_break []
2025-12-04T10:11:57.4296416Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml -
2025-12-04T10:11:57.4297077Z =========================== short test summary info ============================
2025-12-04T10:11:57.4298595Z FAILED [0.5607s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4299969Z 
2025-12-04T10:11:57.4300101Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4300841Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4301457Z 
2025-12-04T10:11:57.4301623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4301968Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4302266Z ================== 1 failed, 57 deselected, 2 rerun in 12.35s ==================
2025-12-04T10:11:57.4302532Z Got exit code 1
2025-12-04T10:11:57.4303119Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4303920Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.4304498Z W1204 09:28:16.790000 35262 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4305220Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml
2025-12-04T10:11:57.4305858Z ============================= test session starts ==============================
2025-12-04T10:11:57.4306252Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4306601Z cachedir: .pytest_cache
2025-12-04T10:11:57.4307015Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4307569Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4307780Z configfile: pytest.ini
2025-12-04T10:11:57.4308201Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4308729Z collecting ... collected 58 items / 2 deselected / 56 selected
2025-12-04T10:11:57.4309022Z stepcurrent: skipping 2 already run items.
2025-12-04T10:11:57.4309244Z Running 56 items in this shard
2025-12-04T10:11:57.4309370Z 
2025-12-04T10:11:57.4309897Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0043s] [  1%]
2025-12-04T10:11:57.4310993Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5955s] [  1%]
2025-12-04T10:11:57.4312038Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.6029s] [  1%]
2025-12-04T10:11:57.4312577Z 
2025-12-04T10:11:57.4312668Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4313145Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4313604Z Traceback (most recent call last):
2025-12-04T10:11:57.4314062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4314508Z     method(*args, **kwargs)
2025-12-04T10:11:57.4314921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4315363Z     method(*args, **kwargs)
2025-12-04T10:11:57.4315771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4316202Z     with policy():
2025-12-04T10:11:57.4316601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4317224Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4318169Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4319058Z 
2025-12-04T10:11:57.4319188Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4319985Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4320605Z 
2025-12-04T10:11:57.4320766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4321133Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4321437Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4321977Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4322648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4322918Z graph_break []
2025-12-04T10:11:57.4323321Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4323879Z Traceback (most recent call last):
2025-12-04T10:11:57.4324332Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4324785Z     method(*args, **kwargs)
2025-12-04T10:11:57.4325208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4325651Z     method(*args, **kwargs)
2025-12-04T10:11:57.4326064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4326495Z     with policy():
2025-12-04T10:11:57.4326907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4327348Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4328296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4329202Z 
2025-12-04T10:11:57.4329337Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4330079Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4330696Z 
2025-12-04T10:11:57.4330874Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4331248Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4331559Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4332083Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4332644Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4332909Z graph_break []
2025-12-04T10:11:57.4333131Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4333435Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4333727Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4334284Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4334765Z graph_break []
2025-12-04T10:11:57.4334938Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4335422Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4335896Z Traceback (most recent call last):
2025-12-04T10:11:57.4336350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4336792Z     method(*args, **kwargs)
2025-12-04T10:11:57.4337202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4337640Z     method(*args, **kwargs)
2025-12-04T10:11:57.4338046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4338480Z     with policy():
2025-12-04T10:11:57.4338854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4338927Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4339763Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4339831Z 
2025-12-04T10:11:57.4339962Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4340505Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4340509Z 
2025-12-04T10:11:57.4340672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4340804Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4340899Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4341253Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4341395Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4341455Z graph_break []
2025-12-04T10:11:57.4341584Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4341676Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4341798Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4342154Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4342214Z graph_break []
2025-12-04T10:11:57.4342340Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4342435Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4342557Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4342903Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4342961Z graph_break []
2025-12-04T10:11:57.4343444Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml -
2025-12-04T10:11:57.4343552Z =========================== short test summary info ============================
2025-12-04T10:11:57.4344880Z FAILED [0.6029s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4344887Z 
2025-12-04T10:11:57.4345018Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4345553Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4345556Z 
2025-12-04T10:11:57.4345724Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4345901Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4346016Z =================== 1 failed, 2 deselected, 2 rerun in 3.23s ===================
2025-12-04T10:11:57.4346080Z Got exit code 1
2025-12-04T10:11:57.4346146Z Retrying single test...
2025-12-04T10:11:57.4346481Z W1204 09:28:26.566000 35444 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4346880Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml
2025-12-04T10:11:57.4346977Z ============================= test session starts ==============================
2025-12-04T10:11:57.4347193Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4347260Z cachedir: .pytest_cache
2025-12-04T10:11:57.4347574Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4347654Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4347721Z configfile: pytest.ini
2025-12-04T10:11:57.4348047Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4348177Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4348764Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4348842Z Running 1 items in this shard
2025-12-04T10:11:57.4348846Z 
2025-12-04T10:11:57.4349596Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:27.698347953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4349600Z 
2025-12-04T10:11:57.4349906Z [W1204 09:28:36.851377013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4349912Z 
2025-12-04T10:11:57.4350210Z [W1204 09:28:36.851659058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4350214Z 
2025-12-04T10:11:57.4350505Z [W1204 09:28:36.852242108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4350508Z 
2025-12-04T10:11:57.4350795Z [W1204 09:28:36.852474022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4350798Z 
2025-12-04T10:11:57.4351096Z [W1204 09:28:36.853695653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4351099Z 
2025-12-04T10:11:57.4351385Z [W1204 09:28:36.853869306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4351391Z 
2025-12-04T10:11:57.4351679Z [W1204 09:28:36.854137190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4351686Z 
2025-12-04T10:11:57.4351977Z [W1204 09:28:36.854298043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4351980Z 
2025-12-04T10:11:57.4352267Z [W1204 09:28:36.862951890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4352271Z 
2025-12-04T10:11:57.4352636Z [W1204 09:28:36.863174744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4352640Z 
2025-12-04T10:11:57.4352930Z [W1204 09:28:36.863350477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4352933Z 
2025-12-04T10:11:57.4353315Z [W1204 09:28:36.863589591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4353320Z 
2025-12-04T10:11:57.4353612Z [W1204 09:28:36.863732553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4353614Z 
2025-12-04T10:11:57.4353905Z [W1204 09:28:36.863974988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4353908Z 
2025-12-04T10:11:57.4354199Z [W1204 09:28:36.864118190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4354205Z 
2025-12-04T10:11:57.4354497Z [W1204 09:28:36.864351024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4354500Z 
2025-12-04T10:11:57.4354786Z [W1204 09:28:36.864477686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4354792Z 
2025-12-04T10:11:57.4355081Z [W1204 09:28:37.953122867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4355088Z 
2025-12-04T10:11:57.4355375Z [W1204 09:28:37.953366841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4355378Z 
2025-12-04T10:11:57.4355667Z [W1204 09:28:37.953537934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4355670Z 
2025-12-04T10:11:57.4355965Z [W1204 09:28:37.953758458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4355968Z 
2025-12-04T10:11:57.4356263Z [W1204 09:28:37.953883450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4356269Z 
2025-12-04T10:11:57.4356561Z [W1204 09:28:37.954106884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4356565Z 
2025-12-04T10:11:57.4356854Z [W1204 09:28:37.954231206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4356860Z 
2025-12-04T10:11:57.4357147Z [W1204 09:28:37.954436709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4357150Z 
2025-12-04T10:11:57.4357439Z [W1204 09:28:37.954558851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4357443Z 
2025-12-04T10:11:57.4357529Z ('RERUN', {'yellow': True}) [11.1880s] [100%]
2025-12-04T10:11:57.4358282Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:38.178125644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4358288Z 
2025-12-04T10:11:57.4358582Z [W1204 09:28:38.178385498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4358585Z 
2025-12-04T10:11:57.4358875Z [W1204 09:28:38.178546261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4358878Z 
2025-12-04T10:11:57.4359244Z [W1204 09:28:38.178757235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4359248Z 
2025-12-04T10:11:57.4359539Z [W1204 09:28:38.178887237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4359607Z 
2025-12-04T10:11:57.4359944Z [W1204 09:28:38.179105601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4359948Z 
2025-12-04T10:11:57.4360241Z [W1204 09:28:38.179231523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4360245Z 
2025-12-04T10:11:57.4360533Z [W1204 09:28:38.179438786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4360536Z 
2025-12-04T10:11:57.4360830Z [W1204 09:28:38.179561598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4360833Z 
2025-12-04T10:11:57.4361120Z [W1204 09:28:38.185660921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4361126Z 
2025-12-04T10:11:57.4361416Z [W1204 09:28:38.185833584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4361419Z 
2025-12-04T10:11:57.4361705Z [W1204 09:28:38.185985407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4361708Z 
2025-12-04T10:11:57.4361996Z [W1204 09:28:38.186191970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4361999Z 
2025-12-04T10:11:57.4362291Z [W1204 09:28:38.186314732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4362294Z 
2025-12-04T10:11:57.4362583Z [W1204 09:28:38.186533006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4362590Z 
2025-12-04T10:11:57.4362875Z [W1204 09:28:38.186660329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4362878Z 
2025-12-04T10:11:57.4363163Z [W1204 09:28:38.186864252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4363169Z 
2025-12-04T10:11:57.4363453Z [W1204 09:28:38.186988254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4363457Z 
2025-12-04T10:11:57.4363744Z [W1204 09:28:38.268493464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4363747Z 
2025-12-04T10:11:57.4364040Z [W1204 09:28:38.268716128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4364046Z 
2025-12-04T10:11:57.4364335Z [W1204 09:28:38.268869891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4364338Z 
2025-12-04T10:11:57.4364629Z [W1204 09:28:38.269075734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4364632Z 
2025-12-04T10:11:57.4364919Z [W1204 09:28:38.269204096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4364922Z 
2025-12-04T10:11:57.4365286Z [W1204 09:28:38.269415390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4365290Z 
2025-12-04T10:11:57.4365576Z [W1204 09:28:38.269539462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4365579Z 
2025-12-04T10:11:57.4365935Z [W1204 09:28:38.269745605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4365939Z 
2025-12-04T10:11:57.4366229Z [W1204 09:28:38.269867707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4366233Z 
2025-12-04T10:11:57.4366313Z ('RERUN', {'yellow': True}) [0.5509s] [100%]
2025-12-04T10:11:57.4367063Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:38.728016517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4367067Z 
2025-12-04T10:11:57.4367355Z [W1204 09:28:38.728228120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4367358Z 
2025-12-04T10:11:57.4367685Z [W1204 09:28:38.728395273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4367689Z 
2025-12-04T10:11:57.4368030Z [W1204 09:28:38.728622327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4368033Z 
2025-12-04T10:11:57.4368375Z [W1204 09:28:38.728750009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4368379Z 
2025-12-04T10:11:57.4368720Z [W1204 09:28:38.728968203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4368723Z 
2025-12-04T10:11:57.4369076Z [W1204 09:28:38.729091525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4369080Z 
2025-12-04T10:11:57.4369418Z [W1204 09:28:38.729300199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4369425Z 
2025-12-04T10:11:57.4369767Z [W1204 09:28:38.729422441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4369771Z 
2025-12-04T10:11:57.4370080Z [W1204 09:28:38.735424492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4370083Z 
2025-12-04T10:11:57.4370371Z [W1204 09:28:38.735597985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4370377Z 
2025-12-04T10:11:57.4370672Z [W1204 09:28:38.735751138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4370675Z 
2025-12-04T10:11:57.4370964Z [W1204 09:28:38.735957301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4370970Z 
2025-12-04T10:11:57.4371261Z [W1204 09:28:38.736081213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4371264Z 
2025-12-04T10:11:57.4371552Z [W1204 09:28:38.736305957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4371555Z 
2025-12-04T10:11:57.4371846Z [W1204 09:28:38.736432049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4371849Z 
2025-12-04T10:11:57.4372213Z [W1204 09:28:38.736641363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4372217Z 
2025-12-04T10:11:57.4372509Z [W1204 09:28:38.736763855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4372577Z 
2025-12-04T10:11:57.4372865Z [W1204 09:28:38.818823964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4372868Z 
2025-12-04T10:11:57.4373158Z [W1204 09:28:38.819014158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4373165Z 
2025-12-04T10:11:57.4373454Z [W1204 09:28:38.819166990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4373457Z 
2025-12-04T10:11:57.4373756Z [W1204 09:28:38.819371504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4373759Z 
2025-12-04T10:11:57.4374058Z [W1204 09:28:38.819493586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4374064Z 
2025-12-04T10:11:57.4374351Z [W1204 09:28:38.819707899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4374354Z 
2025-12-04T10:11:57.4374648Z [W1204 09:28:38.819830901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4374651Z 
2025-12-04T10:11:57.4374939Z [W1204 09:28:38.820055615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4374942Z 
2025-12-04T10:11:57.4375238Z [W1204 09:28:38.820185158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4375241Z 
2025-12-04T10:11:57.4375303Z FAILED [0.5554s] [100%]
2025-12-04T10:11:57.4375307Z 
2025-12-04T10:11:57.4375397Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4375715Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4375792Z Traceback (most recent call last):
2025-12-04T10:11:57.4376111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4376177Z     method(*args, **kwargs)
2025-12-04T10:11:57.4376472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4376541Z     method(*args, **kwargs)
2025-12-04T10:11:57.4376835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4376901Z     with policy():
2025-12-04T10:11:57.4377196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4377265Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4378097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4378101Z 
2025-12-04T10:11:57.4378229Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4378870Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4378875Z 
2025-12-04T10:11:57.4379038Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4379172Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4379350Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4379703Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4379839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4379898Z graph_break []
2025-12-04T10:11:57.4380023Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4380726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4380801Z   if out == self.unknown_value:
2025-12-04T10:11:57.4381119Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4381196Z Traceback (most recent call last):
2025-12-04T10:11:57.4381497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4381568Z     method(*args, **kwargs)
2025-12-04T10:11:57.4381863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4381924Z     method(*args, **kwargs)
2025-12-04T10:11:57.4382220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4382281Z     with policy():
2025-12-04T10:11:57.4382596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4382664Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4383502Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4383512Z 
2025-12-04T10:11:57.4383641Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4384177Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4384181Z 
2025-12-04T10:11:57.4384345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4384474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4384573Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4384936Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4385064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4385132Z graph_break []
2025-12-04T10:11:57.4385256Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4385950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4386174Z   if out == self.unknown_value:
2025-12-04T10:11:57.4386301Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4386398Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4386522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4386932Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4386996Z graph_break []
2025-12-04T10:11:57.4387080Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4387390Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4387467Z Traceback (most recent call last):
2025-12-04T10:11:57.4387775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4387842Z     method(*args, **kwargs)
2025-12-04T10:11:57.4388135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4388198Z     method(*args, **kwargs)
2025-12-04T10:11:57.4388491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4388550Z     with policy():
2025-12-04T10:11:57.4388850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4388915Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4389746Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4389751Z 
2025-12-04T10:11:57.4389881Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4390420Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4390427Z 
2025-12-04T10:11:57.4390587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4390711Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4390805Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4391153Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4391286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4391357Z graph_break []
2025-12-04T10:11:57.4391482Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4392175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4392251Z   if out == self.unknown_value:
2025-12-04T10:11:57.4392373Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4392469Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4392591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4392939Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4393091Z graph_break []
2025-12-04T10:11:57.4393217Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4393310Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4393436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4393838Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4393900Z graph_break []
2025-12-04T10:11:57.4394386Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml -
2025-12-04T10:11:57.4394484Z =========================== short test summary info ============================
2025-12-04T10:11:57.4395815Z FAILED [0.5554s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4395821Z 
2025-12-04T10:11:57.4395946Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4396486Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4396490Z 
2025-12-04T10:11:57.4396647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4396753Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4396882Z ================== 1 failed, 57 deselected, 2 rerun in 12.32s ==================
2025-12-04T10:11:57.4396942Z Got exit code 1
2025-12-04T10:11:57.4397016Z Retrying single test...
2025-12-04T10:11:57.4397281Z W1204 09:28:45.493000 35631 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4397675Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml
2025-12-04T10:11:57.4397772Z ============================= test session starts ==============================
2025-12-04T10:11:57.4397981Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4398052Z cachedir: .pytest_cache
2025-12-04T10:11:57.4398358Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4398438Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4398511Z configfile: pytest.ini
2025-12-04T10:11:57.4398827Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4398956Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4399548Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4399619Z Running 1 items in this shard
2025-12-04T10:11:57.4399623Z 
2025-12-04T10:11:57.4400424Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:46.608875565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4400503Z 
2025-12-04T10:11:57.4400808Z [W1204 09:28:55.661299219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4400811Z 
2025-12-04T10:11:57.4401108Z [W1204 09:28:55.661568154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4401186Z 
2025-12-04T10:11:57.4401480Z [W1204 09:28:55.662137653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4401483Z 
2025-12-04T10:11:57.4401776Z [W1204 09:28:55.662353097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4401779Z 
2025-12-04T10:11:57.4402069Z [W1204 09:28:55.663679850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4402072Z 
2025-12-04T10:11:57.4402370Z [W1204 09:28:55.663908984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4402374Z 
2025-12-04T10:11:57.4402660Z [W1204 09:28:55.664208669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4402665Z 
2025-12-04T10:11:57.4402956Z [W1204 09:28:55.664381043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4402959Z 
2025-12-04T10:11:57.4403249Z [W1204 09:28:55.672792299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4403252Z 
2025-12-04T10:11:57.4403540Z [W1204 09:28:55.672992902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4403544Z 
2025-12-04T10:11:57.4403838Z [W1204 09:28:55.673168125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4403841Z 
2025-12-04T10:11:57.4404127Z [W1204 09:28:55.673405700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4404133Z 
2025-12-04T10:11:57.4404425Z [W1204 09:28:55.673555352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4404428Z 
2025-12-04T10:11:57.4404715Z [W1204 09:28:55.673794966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4404718Z 
2025-12-04T10:11:57.4405014Z [W1204 09:28:55.673931609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4405017Z 
2025-12-04T10:11:57.4405307Z [W1204 09:28:55.674154102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4405310Z 
2025-12-04T10:11:57.4405605Z [W1204 09:28:55.674281285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4405610Z 
2025-12-04T10:11:57.4405901Z [W1204 09:28:55.762080002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4405905Z 
2025-12-04T10:11:57.4406193Z [W1204 09:28:55.762301165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4406201Z 
2025-12-04T10:11:57.4406491Z [W1204 09:28:55.762455568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4406495Z 
2025-12-04T10:11:57.4406868Z [W1204 09:28:55.762665382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4406872Z 
2025-12-04T10:11:57.4407165Z [W1204 09:28:55.762796944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4407233Z 
2025-12-04T10:11:57.4407520Z [W1204 09:28:55.763013458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4407524Z 
2025-12-04T10:11:57.4407817Z [W1204 09:28:55.763142390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4407820Z 
2025-12-04T10:11:57.4408122Z [W1204 09:28:55.763349084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4408126Z 
2025-12-04T10:11:57.4408425Z [W1204 09:28:55.763473406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4408429Z 
2025-12-04T10:11:57.4408511Z ('RERUN', {'yellow': True}) [11.0721s] [100%]
2025-12-04T10:11:57.4409262Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:57.998558854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4409269Z 
2025-12-04T10:11:57.4409557Z [W1204 09:28:57.998823368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4409560Z 
2025-12-04T10:11:57.4409851Z [W1204 09:28:57.998973751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4409860Z 
2025-12-04T10:11:57.4410150Z [W1204 09:28:57.999189705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4410153Z 
2025-12-04T10:11:57.4410444Z [W1204 09:28:57.999313617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4410450Z 
2025-12-04T10:11:57.4410742Z [W1204 09:28:57.999532251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4410745Z 
2025-12-04T10:11:57.4411035Z [W1204 09:28:57.999660063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4411038Z 
2025-12-04T10:11:57.4411332Z [W1204 09:28:57.999863996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4411335Z 
2025-12-04T10:11:57.4411626Z [W1204 09:28:57.999986858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4411629Z 
2025-12-04T10:11:57.4411923Z [W1204 09:28:57.006375039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4411926Z 
2025-12-04T10:11:57.4412218Z [W1204 09:28:57.006557693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4412221Z 
2025-12-04T10:11:57.4412513Z [W1204 09:28:57.006711005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4412516Z 
2025-12-04T10:11:57.4412805Z [W1204 09:28:57.006915919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4412808Z 
2025-12-04T10:11:57.4413101Z [W1204 09:28:57.007044801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4413173Z 
2025-12-04T10:11:57.4413474Z [W1204 09:28:57.007262295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4413478Z 
2025-12-04T10:11:57.4413768Z [W1204 09:28:57.007389127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4413835Z 
2025-12-04T10:11:57.4414129Z [W1204 09:28:57.007596911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4414132Z 
2025-12-04T10:11:57.4414421Z [W1204 09:28:57.007725333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4414424Z 
2025-12-04T10:11:57.4414715Z [W1204 09:28:57.093708618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4414719Z 
2025-12-04T10:11:57.4415011Z [W1204 09:28:57.093928122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4415014Z 
2025-12-04T10:11:57.4415311Z [W1204 09:28:57.094077664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4415317Z 
2025-12-04T10:11:57.4415607Z [W1204 09:28:57.094291608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4415610Z 
2025-12-04T10:11:57.4415899Z [W1204 09:28:57.094420730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4415907Z 
2025-12-04T10:11:57.4416198Z [W1204 09:28:57.094637824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4416201Z 
2025-12-04T10:11:57.4416496Z [W1204 09:28:57.094762416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4416499Z 
2025-12-04T10:11:57.4416791Z [W1204 09:28:57.094966210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4416797Z 
2025-12-04T10:11:57.4417234Z [W1204 09:28:57.095090702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4417238Z 
2025-12-04T10:11:57.4417334Z ('RERUN', {'yellow': True}) [0.5724s] [100%]
2025-12-04T10:11:57.4418088Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:28:57.567704446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4418092Z 
2025-12-04T10:11:57.4418391Z [W1204 09:28:57.567919399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4418394Z 
2025-12-04T10:11:57.4418686Z [W1204 09:28:57.568071772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4418693Z 
2025-12-04T10:11:57.4418983Z [W1204 09:28:57.568282506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4418986Z 
2025-12-04T10:11:57.4419274Z [W1204 09:28:57.568423288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4419277Z 
2025-12-04T10:11:57.4419565Z [W1204 09:28:57.568646442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4419572Z 
2025-12-04T10:11:57.4419969Z [W1204 09:28:57.568772654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4419972Z 
2025-12-04T10:11:57.4420261Z [W1204 09:28:57.568982418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4420352Z 
2025-12-04T10:11:57.4420647Z [W1204 09:28:57.569108400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4420650Z 
2025-12-04T10:11:57.4420939Z [W1204 09:28:57.575280378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4420942Z 
2025-12-04T10:11:57.4421240Z [W1204 09:28:57.575456221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4421243Z 
2025-12-04T10:11:57.4421536Z [W1204 09:28:57.575610083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4421540Z 
2025-12-04T10:11:57.4421834Z [W1204 09:28:57.575814627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4421839Z 
2025-12-04T10:11:57.4422126Z [W1204 09:28:57.575940029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4422129Z 
2025-12-04T10:11:57.4422422Z [W1204 09:28:57.576154503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4422425Z 
2025-12-04T10:11:57.4422713Z [W1204 09:28:57.576276895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4422716Z 
2025-12-04T10:11:57.4423008Z [W1204 09:28:57.576495799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4423011Z 
2025-12-04T10:11:57.4423305Z [W1204 09:28:57.576624211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4423311Z 
2025-12-04T10:11:57.4423602Z [W1204 09:28:57.661066499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4423605Z 
2025-12-04T10:11:57.4423900Z [W1204 09:28:57.661263263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4423903Z 
2025-12-04T10:11:57.4424191Z [W1204 09:28:57.661419905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4424194Z 
2025-12-04T10:11:57.4424489Z [W1204 09:28:57.661629549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4424493Z 
2025-12-04T10:11:57.4424785Z [W1204 09:28:57.661753251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4424789Z 
2025-12-04T10:11:57.4425086Z [W1204 09:28:57.661969145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4425089Z 
2025-12-04T10:11:57.4425379Z [W1204 09:28:57.662093577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4425382Z 
2025-12-04T10:11:57.4425672Z [W1204 09:28:57.662297521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4425681Z 
2025-12-04T10:11:57.4425971Z [W1204 09:28:57.662419823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4426046Z 
2025-12-04T10:11:57.4426110Z FAILED [0.5624s] [100%]
2025-12-04T10:11:57.4426114Z 
2025-12-04T10:11:57.4426205Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4426519Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4426691Z Traceback (most recent call last):
2025-12-04T10:11:57.4427007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4427074Z     method(*args, **kwargs)
2025-12-04T10:11:57.4427382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4427447Z     method(*args, **kwargs)
2025-12-04T10:11:57.4427740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4427806Z     with policy():
2025-12-04T10:11:57.4428107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4428179Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4429005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4429008Z 
2025-12-04T10:11:57.4429142Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4429692Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4429695Z 
2025-12-04T10:11:57.4429859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4429994Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4430094Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4430452Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4430589Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4430649Z graph_break []
2025-12-04T10:11:57.4430783Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4431479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4431551Z   if out == self.unknown_value:
2025-12-04T10:11:57.4431866Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4431943Z Traceback (most recent call last):
2025-12-04T10:11:57.4432252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4432324Z     method(*args, **kwargs)
2025-12-04T10:11:57.4432622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4432693Z     method(*args, **kwargs)
2025-12-04T10:11:57.4432985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4433046Z     with policy():
2025-12-04T10:11:57.4433423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4433491Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4434328Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4434398Z 
2025-12-04T10:11:57.4434527Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4435066Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4435070Z 
2025-12-04T10:11:57.4435229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4435359Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4435459Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4435808Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4435942Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4436000Z graph_break []
2025-12-04T10:11:57.4436126Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4436823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4436892Z   if out == self.unknown_value:
2025-12-04T10:11:57.4437016Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4437114Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4437242Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4437599Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4437661Z graph_break []
2025-12-04T10:11:57.4437757Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4438074Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4438150Z Traceback (most recent call last):
2025-12-04T10:11:57.4438460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4438527Z     method(*args, **kwargs)
2025-12-04T10:11:57.4438826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4438896Z     method(*args, **kwargs)
2025-12-04T10:11:57.4439196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4439259Z     with policy():
2025-12-04T10:11:57.4439561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4439628Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4440514Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4440518Z 
2025-12-04T10:11:57.4440724Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4441267Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4441334Z 
2025-12-04T10:11:57.4441497Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4441625Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4441730Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4442083Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4442217Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4442277Z graph_break []
2025-12-04T10:11:57.4442404Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4443102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4443178Z   if out == self.unknown_value:
2025-12-04T10:11:57.4443302Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4443403Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4443526Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4443879Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4443938Z graph_break []
2025-12-04T10:11:57.4444064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4444162Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4444288Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4444634Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4444704Z graph_break []
2025-12-04T10:11:57.4445194Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml -
2025-12-04T10:11:57.4445301Z =========================== short test summary info ============================
2025-12-04T10:11:57.4446627Z FAILED [0.5624s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4446633Z 
2025-12-04T10:11:57.4446777Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4447322Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4447325Z 
2025-12-04T10:11:57.4447491Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4447598Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4447785Z ================== 1 failed, 57 deselected, 2 rerun in 12.23s ==================
2025-12-04T10:11:57.4447850Z Got exit code 1
2025-12-04T10:11:57.4448350Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4448659Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.4448938Z W1204 09:29:04.282000 35818 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4449327Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml
2025-12-04T10:11:57.4449428Z ============================= test session starts ==============================
2025-12-04T10:11:57.4449637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4449703Z cachedir: .pytest_cache
2025-12-04T10:11:57.4450018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4450097Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4450168Z configfile: pytest.ini
2025-12-04T10:11:57.4450491Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4450620Z collecting ... collected 58 items / 3 deselected / 55 selected
2025-12-04T10:11:57.4450714Z stepcurrent: skipping 3 already run items.
2025-12-04T10:11:57.4450785Z Running 55 items in this shard
2025-12-04T10:11:57.4450788Z 
2025-12-04T10:11:57.4451315Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9943s] [  1%]
2025-12-04T10:11:57.4451824Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5933s] [  1%]
2025-12-04T10:11:57.4452287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5945s] [  1%]
2025-12-04T10:11:57.4452293Z 
2025-12-04T10:11:57.4452385Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4452698Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4452778Z Traceback (most recent call last):
2025-12-04T10:11:57.4453095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4453160Z     method(*args, **kwargs)
2025-12-04T10:11:57.4453467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4453531Z     method(*args, **kwargs)
2025-12-04T10:11:57.4453832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4453897Z     with policy():
2025-12-04T10:11:57.4454190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4454268Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4455093Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4455097Z 
2025-12-04T10:11:57.4455305Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4455844Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4455929Z 
2025-12-04T10:11:57.4456090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4456222Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4456319Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4456678Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4456807Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4456868Z graph_break []
2025-12-04T10:11:57.4457186Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4457259Z Traceback (most recent call last):
2025-12-04T10:11:57.4457559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4457633Z     method(*args, **kwargs)
2025-12-04T10:11:57.4457928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4457996Z     method(*args, **kwargs)
2025-12-04T10:11:57.4458288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4458348Z     with policy():
2025-12-04T10:11:57.4458661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4458728Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4459562Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4459568Z 
2025-12-04T10:11:57.4459698Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4460232Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4460238Z 
2025-12-04T10:11:57.4460397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4460527Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4460627Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4460979Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4461104Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4461174Z graph_break []
2025-12-04T10:11:57.4461298Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4461395Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4461519Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4461864Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4461930Z graph_break []
2025-12-04T10:11:57.4462014Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4462396Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4462475Z Traceback (most recent call last):
2025-12-04T10:11:57.4462775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4462916Z     method(*args, **kwargs)
2025-12-04T10:11:57.4463213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4463279Z     method(*args, **kwargs)
2025-12-04T10:11:57.4463572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4463631Z     with policy():
2025-12-04T10:11:57.4463928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4463993Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4464828Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4464834Z 
2025-12-04T10:11:57.4464963Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4465496Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4465501Z 
2025-12-04T10:11:57.4465660Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4465787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4465884Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4466235Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4466360Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4466425Z graph_break []
2025-12-04T10:11:57.4466551Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4466643Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4466768Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4467112Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4467170Z graph_break []
2025-12-04T10:11:57.4467299Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4467396Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4467525Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4467873Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4467935Z graph_break []
2025-12-04T10:11:57.4468427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml -
2025-12-04T10:11:57.4468526Z =========================== short test summary info ============================
2025-12-04T10:11:57.4469926Z FAILED [0.5945s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4469993Z 
2025-12-04T10:11:57.4470120Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4470662Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4470665Z 
2025-12-04T10:11:57.4470820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4470925Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4471044Z =================== 1 failed, 3 deselected, 2 rerun in 3.21s ===================
2025-12-04T10:11:57.4471104Z Got exit code 1
2025-12-04T10:11:57.4471175Z Retrying single test...
2025-12-04T10:11:57.4471439Z W1204 09:29:14.020000 36000 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4471823Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml
2025-12-04T10:11:57.4471927Z ============================= test session starts ==============================
2025-12-04T10:11:57.4472136Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4472204Z cachedir: .pytest_cache
2025-12-04T10:11:57.4472517Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4472595Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4472665Z configfile: pytest.ini
2025-12-04T10:11:57.4472990Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4473121Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4473712Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4473786Z Running 1 items in this shard
2025-12-04T10:11:57.4473789Z 
2025-12-04T10:11:57.4474633Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:15.133219758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4474637Z 
2025-12-04T10:11:57.4474943Z [W1204 09:29:24.293553230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4474947Z 
2025-12-04T10:11:57.4475245Z [W1204 09:29:24.293815624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4475252Z 
2025-12-04T10:11:57.4475543Z [W1204 09:29:24.294385544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4475547Z 
2025-12-04T10:11:57.4475838Z [W1204 09:29:24.294581377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4475846Z 
2025-12-04T10:11:57.4476135Z [W1204 09:29:24.295844949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4476139Z 
2025-12-04T10:11:57.4476500Z [W1204 09:29:24.296006222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4476504Z 
2025-12-04T10:11:57.4476801Z [W1204 09:29:24.296277006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4476804Z 
2025-12-04T10:11:57.4477166Z [W1204 09:29:24.296441719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4477170Z 
2025-12-04T10:11:57.4477467Z [W1204 09:29:24.304929983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4477470Z 
2025-12-04T10:11:57.4477759Z [W1204 09:29:24.305150487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4477763Z 
2025-12-04T10:11:57.4478062Z [W1204 09:29:24.305330740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4478068Z 
2025-12-04T10:11:57.4478361Z [W1204 09:29:24.305570494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4478364Z 
2025-12-04T10:11:57.4478659Z [W1204 09:29:24.305722347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4478665Z 
2025-12-04T10:11:57.4478953Z [W1204 09:29:24.305964381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4478956Z 
2025-12-04T10:11:57.4479247Z [W1204 09:29:24.306107853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4479257Z 
2025-12-04T10:11:57.4479546Z [W1204 09:29:24.306340477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4479550Z 
2025-12-04T10:11:57.4479843Z [W1204 09:29:24.306492980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4479846Z 
2025-12-04T10:11:57.4480189Z [W1204 09:29:24.394751314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4480195Z 
2025-12-04T10:11:57.4480485Z [W1204 09:29:24.394969998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4480488Z 
2025-12-04T10:11:57.4480783Z [W1204 09:29:24.395122420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4480786Z 
2025-12-04T10:11:57.4481077Z [W1204 09:29:24.395334164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4481080Z 
2025-12-04T10:11:57.4481377Z [W1204 09:29:24.395464936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4481380Z 
2025-12-04T10:11:57.4481670Z [W1204 09:29:24.395683800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4481676Z 
2025-12-04T10:11:57.4481966Z [W1204 09:29:24.395810332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4481969Z 
2025-12-04T10:11:57.4482260Z [W1204 09:29:24.396011855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4482262Z 
2025-12-04T10:11:57.4482550Z [W1204 09:29:24.396135697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4482553Z 
2025-12-04T10:11:57.4482728Z ('RERUN', {'yellow': True}) [11.1854s] [100%]
2025-12-04T10:11:57.4483478Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:25.634565905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4483546Z 
2025-12-04T10:11:57.4483846Z [W1204 09:29:25.634822739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4483849Z 
2025-12-04T10:11:57.4484139Z [W1204 09:29:25.634977952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4484142Z 
2025-12-04T10:11:57.4484437Z [W1204 09:29:25.635185106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4484441Z 
2025-12-04T10:11:57.4484735Z [W1204 09:29:25.635317928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4484738Z 
2025-12-04T10:11:57.4485030Z [W1204 09:29:25.635541432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4485036Z 
2025-12-04T10:11:57.4485325Z [W1204 09:29:25.635668464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4485328Z 
2025-12-04T10:11:57.4485617Z [W1204 09:29:25.635874767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4485624Z 
2025-12-04T10:11:57.4485911Z [W1204 09:29:25.636004220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4485914Z 
2025-12-04T10:11:57.4486204Z [W1204 09:29:25.642237786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4486207Z 
2025-12-04T10:11:57.4486498Z [W1204 09:29:25.642413609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4486504Z 
2025-12-04T10:11:57.4486794Z [W1204 09:29:25.642563532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4486798Z 
2025-12-04T10:11:57.4487101Z [W1204 09:29:25.642767225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4487105Z 
2025-12-04T10:11:57.4487400Z [W1204 09:29:25.642899597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4487403Z 
2025-12-04T10:11:57.4487697Z [W1204 09:29:25.643117051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4487700Z 
2025-12-04T10:11:57.4487994Z [W1204 09:29:25.643241983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4488000Z 
2025-12-04T10:11:57.4488296Z [W1204 09:29:25.643449267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4488299Z 
2025-12-04T10:11:57.4488587Z [W1204 09:29:25.643573179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4488590Z 
2025-12-04T10:11:57.4488880Z [W1204 09:29:25.727352767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4488888Z 
2025-12-04T10:11:57.4489249Z [W1204 09:29:25.727573291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4489253Z 
2025-12-04T10:11:57.4489544Z [W1204 09:29:25.727725214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4489547Z 
2025-12-04T10:11:57.4489907Z [W1204 09:29:25.727931568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4489910Z 
2025-12-04T10:11:57.4490198Z [W1204 09:29:25.728059870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4490201Z 
2025-12-04T10:11:57.4490495Z [W1204 09:29:25.728277493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4490499Z 
2025-12-04T10:11:57.4490789Z [W1204 09:29:25.728412496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4490795Z 
2025-12-04T10:11:57.4491090Z [W1204 09:29:25.728624319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4491093Z 
2025-12-04T10:11:57.4491384Z [W1204 09:29:25.728746041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4491390Z 
2025-12-04T10:11:57.4491471Z ('RERUN', {'yellow': True}) [0.5660s] [100%]
2025-12-04T10:11:57.4492216Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:26.198356901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4492220Z 
2025-12-04T10:11:57.4492513Z [W1204 09:29:26.198567205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4492516Z 
2025-12-04T10:11:57.4492807Z [W1204 09:29:26.198713397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4492810Z 
2025-12-04T10:11:57.4493102Z [W1204 09:29:26.198922761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4493105Z 
2025-12-04T10:11:57.4493396Z [W1204 09:29:26.199046873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4493399Z 
2025-12-04T10:11:57.4493690Z [W1204 09:29:26.199264777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4493694Z 
2025-12-04T10:11:57.4493989Z [W1204 09:29:26.199387319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4493995Z 
2025-12-04T10:11:57.4494284Z [W1204 09:29:26.199592792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4494288Z 
2025-12-04T10:11:57.4494581Z [W1204 09:29:26.199714855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4494587Z 
2025-12-04T10:11:57.4494880Z [W1204 09:29:26.205807448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4494884Z 
2025-12-04T10:11:57.4495174Z [W1204 09:29:26.205981942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4495181Z 
2025-12-04T10:11:57.4495471Z [W1204 09:29:26.206133734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4495474Z 
2025-12-04T10:11:57.4495911Z [W1204 09:29:26.206334868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4495915Z 
2025-12-04T10:11:57.4496211Z [W1204 09:29:26.206462330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4496283Z 
2025-12-04T10:11:57.4496573Z [W1204 09:29:26.206675963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4496576Z 
2025-12-04T10:11:57.4496872Z [W1204 09:29:26.206800385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4496875Z 
2025-12-04T10:11:57.4497168Z [W1204 09:29:26.207004609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4497171Z 
2025-12-04T10:11:57.4497466Z [W1204 09:29:26.207127091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4497469Z 
2025-12-04T10:11:57.4497756Z [W1204 09:29:26.290621435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4497762Z 
2025-12-04T10:11:57.4498055Z [W1204 09:29:26.290824858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4498058Z 
2025-12-04T10:11:57.4498347Z [W1204 09:29:26.290972741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4498350Z 
2025-12-04T10:11:57.4498641Z [W1204 09:29:26.291184785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4498647Z 
2025-12-04T10:11:57.4498937Z [W1204 09:29:26.291310757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4498940Z 
2025-12-04T10:11:57.4499239Z [W1204 09:29:26.291525921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4499245Z 
2025-12-04T10:11:57.4499546Z [W1204 09:29:26.291654513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4499550Z 
2025-12-04T10:11:57.4499838Z [W1204 09:29:26.291854716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4499842Z 
2025-12-04T10:11:57.4500136Z [W1204 09:29:26.291979238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4500139Z 
2025-12-04T10:11:57.4500204Z FAILED [0.5608s] [100%]
2025-12-04T10:11:57.4500207Z 
2025-12-04T10:11:57.4500307Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4500628Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4500707Z Traceback (most recent call last):
2025-12-04T10:11:57.4501023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4501092Z     method(*args, **kwargs)
2025-12-04T10:11:57.4501391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4501458Z     method(*args, **kwargs)
2025-12-04T10:11:57.4501745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4501810Z     with policy():
2025-12-04T10:11:57.4502195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4502263Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4503089Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4503157Z 
2025-12-04T10:11:57.4503290Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4503851Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4503855Z 
2025-12-04T10:11:57.4504027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4504161Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4504266Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4504626Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4504765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4504827Z graph_break []
2025-12-04T10:11:57.4504952Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4505656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4505728Z   if out == self.unknown_value:
2025-12-04T10:11:57.4506048Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4506123Z Traceback (most recent call last):
2025-12-04T10:11:57.4506429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4506503Z     method(*args, **kwargs)
2025-12-04T10:11:57.4506797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4506865Z     method(*args, **kwargs)
2025-12-04T10:11:57.4507156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4507216Z     with policy():
2025-12-04T10:11:57.4507513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4507594Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4508434Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4508445Z 
2025-12-04T10:11:57.4508573Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4509112Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4509115Z 
2025-12-04T10:11:57.4509282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4509408Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4509599Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4509953Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4510083Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4510211Z graph_break []
2025-12-04T10:11:57.4510338Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4511035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4511112Z   if out == self.unknown_value:
2025-12-04T10:11:57.4511234Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4511333Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4511461Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4511805Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4511871Z graph_break []
2025-12-04T10:11:57.4511954Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4512269Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4512346Z Traceback (most recent call last):
2025-12-04T10:11:57.4512644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4512713Z     method(*args, **kwargs)
2025-12-04T10:11:57.4513019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4513087Z     method(*args, **kwargs)
2025-12-04T10:11:57.4513384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4513445Z     with policy():
2025-12-04T10:11:57.4513750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4513816Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4514650Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4514657Z 
2025-12-04T10:11:57.4514786Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4515329Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4515333Z 
2025-12-04T10:11:57.4515498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4515628Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4515730Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4516076Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4516204Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4516268Z graph_break []
2025-12-04T10:11:57.4516393Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4517338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4517422Z   if out == self.unknown_value:
2025-12-04T10:11:57.4517638Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4517739Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4517868Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4518218Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4518283Z graph_break []
2025-12-04T10:11:57.4518415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4518517Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4518644Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4518990Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4519057Z graph_break []
2025-12-04T10:11:57.4519549Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml -
2025-12-04T10:11:57.4519650Z =========================== short test summary info ============================
2025-12-04T10:11:57.4521072Z FAILED [0.5608s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4521078Z 
2025-12-04T10:11:57.4521205Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4521750Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4521754Z 
2025-12-04T10:11:57.4521912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4522024Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4522140Z ================== 1 failed, 57 deselected, 2 rerun in 12.34s ==================
2025-12-04T10:11:57.4522199Z Got exit code 1
2025-12-04T10:11:57.4522270Z Retrying single test...
2025-12-04T10:11:57.4522540Z W1204 09:29:32.938000 36187 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4522927Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml
2025-12-04T10:11:57.4523026Z ============================= test session starts ==============================
2025-12-04T10:11:57.4523242Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4523316Z cachedir: .pytest_cache
2025-12-04T10:11:57.4523624Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4523704Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4523771Z configfile: pytest.ini
2025-12-04T10:11:57.4524160Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4524297Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4524881Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4525018Z Running 1 items in this shard
2025-12-04T10:11:57.4525026Z 
2025-12-04T10:11:57.4525778Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:34.052767129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4525782Z 
2025-12-04T10:11:57.4526087Z [W1204 09:29:43.242482605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4526091Z 
2025-12-04T10:11:57.4526386Z [W1204 09:29:43.242745570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4526390Z 
2025-12-04T10:11:57.4526685Z [W1204 09:29:43.243328300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4526691Z 
2025-12-04T10:11:57.4526984Z [W1204 09:29:43.243514863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4526988Z 
2025-12-04T10:11:57.4527289Z [W1204 09:29:43.244787256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4527293Z 
2025-12-04T10:11:57.4527589Z [W1204 09:29:43.244984599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4527593Z 
2025-12-04T10:11:57.4527885Z [W1204 09:29:43.245307075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4527888Z 
2025-12-04T10:11:57.4528180Z [W1204 09:29:43.245458767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4528185Z 
2025-12-04T10:11:57.4528471Z [W1204 09:29:43.253941066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4528474Z 
2025-12-04T10:11:57.4528762Z [W1204 09:29:43.254161950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4528768Z 
2025-12-04T10:11:57.4529056Z [W1204 09:29:43.254337853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4529059Z 
2025-12-04T10:11:57.4529351Z [W1204 09:29:43.254584387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4529354Z 
2025-12-04T10:11:57.4529645Z [W1204 09:29:43.254730609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4529651Z 
2025-12-04T10:11:57.4529941Z [W1204 09:29:43.254970704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4529944Z 
2025-12-04T10:11:57.4530238Z [W1204 09:29:43.255110466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4530241Z 
2025-12-04T10:11:57.4530529Z [W1204 09:29:43.255341160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4530532Z 
2025-12-04T10:11:57.4530894Z [W1204 09:29:43.255477882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4530898Z 
2025-12-04T10:11:57.4531190Z [W1204 09:29:43.344144693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4531276Z 
2025-12-04T10:11:57.4531570Z [W1204 09:29:43.344381757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4531573Z 
2025-12-04T10:11:57.4531862Z [W1204 09:29:43.344535970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4531866Z 
2025-12-04T10:11:57.4532155Z [W1204 09:29:43.344748773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4532164Z 
2025-12-04T10:11:57.4532455Z [W1204 09:29:43.344879356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4532458Z 
2025-12-04T10:11:57.4532745Z [W1204 09:29:43.345103300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4532751Z 
2025-12-04T10:11:57.4533042Z [W1204 09:29:43.345230662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4533045Z 
2025-12-04T10:11:57.4533333Z [W1204 09:29:43.345441426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4533337Z 
2025-12-04T10:11:57.4533633Z [W1204 09:29:43.345567788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4533636Z 
2025-12-04T10:11:57.4533722Z ('RERUN', {'yellow': True}) [11.2184s] [100%]
2025-12-04T10:11:57.4534472Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:44.583126734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4534478Z 
2025-12-04T10:11:57.4534768Z [W1204 09:29:44.583388908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4534771Z 
2025-12-04T10:11:57.4535074Z [W1204 09:29:44.583548861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4535077Z 
2025-12-04T10:11:57.4535373Z [W1204 09:29:44.583761845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4535377Z 
2025-12-04T10:11:57.4535667Z [W1204 09:29:44.583892267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4535671Z 
2025-12-04T10:11:57.4535963Z [W1204 09:29:44.584113651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4535966Z 
2025-12-04T10:11:57.4536258Z [W1204 09:29:44.584244483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4536261Z 
2025-12-04T10:11:57.4536554Z [W1204 09:29:44.584460587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4536557Z 
2025-12-04T10:11:57.4536847Z [W1204 09:29:44.584585529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4536850Z 
2025-12-04T10:11:57.4537142Z [W1204 09:29:44.590795267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4537215Z 
2025-12-04T10:11:57.4537504Z [W1204 09:29:44.590970360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4537507Z 
2025-12-04T10:11:57.4537799Z [W1204 09:29:44.591120033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4537866Z 
2025-12-04T10:11:57.4538155Z [W1204 09:29:44.591320916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4538159Z 
2025-12-04T10:11:57.4538448Z [W1204 09:29:44.591444008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4538455Z 
2025-12-04T10:11:57.4538742Z [W1204 09:29:44.591661602 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4538745Z 
2025-12-04T10:11:57.4539035Z [W1204 09:29:44.591789954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4539039Z 
2025-12-04T10:11:57.4539330Z [W1204 09:29:44.592000458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4539336Z 
2025-12-04T10:11:57.4539623Z [W1204 09:29:44.592126380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4539626Z 
2025-12-04T10:11:57.4539923Z [W1204 09:29:44.675820655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4539926Z 
2025-12-04T10:11:57.4540213Z [W1204 09:29:44.676042209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4540216Z 
2025-12-04T10:11:57.4540511Z [W1204 09:29:44.676195102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4540514Z 
2025-12-04T10:11:57.4540806Z [W1204 09:29:44.676410555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4540812Z 
2025-12-04T10:11:57.4541103Z [W1204 09:29:44.676532587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4541106Z 
2025-12-04T10:11:57.4541394Z [W1204 09:29:44.676751701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4541397Z 
2025-12-04T10:11:57.4541687Z [W1204 09:29:44.676875463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4541696Z 
2025-12-04T10:11:57.4541986Z [W1204 09:29:44.677083917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4541989Z 
2025-12-04T10:11:57.4542276Z [W1204 09:29:44.677205259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4542282Z 
2025-12-04T10:11:57.4542367Z ('RERUN', {'yellow': True}) [0.5642s] [100%]
2025-12-04T10:11:57.4543104Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:29:45.145458829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4543108Z 
2025-12-04T10:11:57.4543401Z [W1204 09:29:45.145676883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4543404Z 
2025-12-04T10:11:57.4543762Z [W1204 09:29:45.145835185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4543766Z 
2025-12-04T10:11:57.4544060Z [W1204 09:29:45.146048669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4544127Z 
2025-12-04T10:11:57.4544414Z [W1204 09:29:45.146181322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4544417Z 
2025-12-04T10:11:57.4544708Z [W1204 09:29:45.146401606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4544711Z 
2025-12-04T10:11:57.4544999Z [W1204 09:29:45.146530008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4545002Z 
2025-12-04T10:11:57.4545293Z [W1204 09:29:45.146738211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4545296Z 
2025-12-04T10:11:57.4545588Z [W1204 09:29:45.146861434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4545594Z 
2025-12-04T10:11:57.4545887Z [W1204 09:29:45.152931389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4545891Z 
2025-12-04T10:11:57.4546180Z [W1204 09:29:45.153108692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4546183Z 
2025-12-04T10:11:57.4546484Z [W1204 09:29:45.153261295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4546487Z 
2025-12-04T10:11:57.4546785Z [W1204 09:29:45.153464249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4546788Z 
2025-12-04T10:11:57.4547078Z [W1204 09:29:45.153586691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4547084Z 
2025-12-04T10:11:57.4547378Z [W1204 09:29:45.153802215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4547381Z 
2025-12-04T10:11:57.4547670Z [W1204 09:29:45.153926817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4547673Z 
2025-12-04T10:11:57.4547960Z [W1204 09:29:45.154131021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4547968Z 
2025-12-04T10:11:57.4548259Z [W1204 09:29:45.154252393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4548262Z 
2025-12-04T10:11:57.4548551Z [W1204 09:29:45.237569960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4548554Z 
2025-12-04T10:11:57.4548852Z [W1204 09:29:45.237759713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4548855Z 
2025-12-04T10:11:57.4549145Z [W1204 09:29:45.237912626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4549148Z 
2025-12-04T10:11:57.4549448Z [W1204 09:29:45.238120079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4549451Z 
2025-12-04T10:11:57.4549810Z [W1204 09:29:45.238245092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4549813Z 
2025-12-04T10:11:57.4550112Z [W1204 09:29:45.238459505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4550115Z 
2025-12-04T10:11:57.4550405Z [W1204 09:29:45.238580998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4550472Z 
2025-12-04T10:11:57.4550770Z [W1204 09:29:45.238781881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4550773Z 
2025-12-04T10:11:57.4551063Z [W1204 09:29:45.238900733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4551066Z 
2025-12-04T10:11:57.4551130Z FAILED [0.5590s] [100%]
2025-12-04T10:11:57.4551133Z 
2025-12-04T10:11:57.4551225Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4551545Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4551628Z Traceback (most recent call last):
2025-12-04T10:11:57.4551943Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4552014Z     method(*args, **kwargs)
2025-12-04T10:11:57.4552320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4552386Z     method(*args, **kwargs)
2025-12-04T10:11:57.4552679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4552746Z     with policy():
2025-12-04T10:11:57.4553041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4553114Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4553940Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4553947Z 
2025-12-04T10:11:57.4554083Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4554635Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4554638Z 
2025-12-04T10:11:57.4554807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4554943Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4555045Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4555409Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4555550Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4555613Z graph_break []
2025-12-04T10:11:57.4555745Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4556446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4556526Z   if out == self.unknown_value:
2025-12-04T10:11:57.4556911Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4556989Z Traceback (most recent call last):
2025-12-04T10:11:57.4557303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4557369Z     method(*args, **kwargs)
2025-12-04T10:11:57.4557752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4557822Z     method(*args, **kwargs)
2025-12-04T10:11:57.4558121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4558186Z     with policy():
2025-12-04T10:11:57.4558485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4558552Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4559396Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4559402Z 
2025-12-04T10:11:57.4559532Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4560129Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4560133Z 
2025-12-04T10:11:57.4560295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4560424Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4560536Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4560895Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4561031Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4561091Z graph_break []
2025-12-04T10:11:57.4561220Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4561917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4561990Z   if out == self.unknown_value:
2025-12-04T10:11:57.4562123Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4562219Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4562347Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4562703Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4562764Z graph_break []
2025-12-04T10:11:57.4562848Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4563167Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4563242Z Traceback (most recent call last):
2025-12-04T10:11:57.4563552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4563619Z     method(*args, **kwargs)
2025-12-04T10:11:57.4563917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4563988Z     method(*args, **kwargs)
2025-12-04T10:11:57.4564353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4564422Z     with policy():
2025-12-04T10:11:57.4564721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4564854Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4565692Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4565696Z 
2025-12-04T10:11:57.4565825Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4566369Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4566373Z 
2025-12-04T10:11:57.4566533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4566672Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4566783Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4567134Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4567265Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4567325Z graph_break []
2025-12-04T10:11:57.4567449Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4568149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4568223Z   if out == self.unknown_value:
2025-12-04T10:11:57.4568351Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4568451Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4568576Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4568933Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4568993Z graph_break []
2025-12-04T10:11:57.4569120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4569221Z stats [('calls_captured', 18), ('unique_graphs', 1)]
2025-12-04T10:11:57.4569344Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4569695Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)]
2025-12-04T10:11:57.4569755Z graph_break []
2025-12-04T10:11:57.4570245Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml -
2025-12-04T10:11:57.4570359Z =========================== short test summary info ============================
2025-12-04T10:11:57.4571753Z FAILED [0.5590s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4571758Z 
2025-12-04T10:11:57.4571892Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4572432Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4572500Z 
2025-12-04T10:11:57.4572664Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4572772Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4572889Z ================== 1 failed, 57 deselected, 2 rerun in 12.37s ==================
2025-12-04T10:11:57.4572957Z Got exit code 1
2025-12-04T10:11:57.4573454Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4573710Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.4573978Z W1204 09:29:51.887000 36374 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4574371Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml
2025-12-04T10:11:57.4574477Z ============================= test session starts ==============================
2025-12-04T10:11:57.4574690Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4574764Z cachedir: .pytest_cache
2025-12-04T10:11:57.4575075Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4575154Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4575230Z configfile: pytest.ini
2025-12-04T10:11:57.4575547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4575678Z collecting ... collected 58 items / 4 deselected / 54 selected
2025-12-04T10:11:57.4575788Z stepcurrent: skipping 4 already run items.
2025-12-04T10:11:57.4575861Z Running 54 items in this shard
2025-12-04T10:11:57.4575864Z 
2025-12-04T10:11:57.4576379Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0557s] [  1%]
2025-12-04T10:11:57.4576872Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6275s] [  1%]
2025-12-04T10:11:57.4577331Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6189s] [  1%]
2025-12-04T10:11:57.4577342Z 
2025-12-04T10:11:57.4577426Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4577733Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4577814Z Traceback (most recent call last):
2025-12-04T10:11:57.4578122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4578192Z     method(*args, **kwargs)
2025-12-04T10:11:57.4578492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4578557Z     method(*args, **kwargs)
2025-12-04T10:11:57.4578923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4578986Z     with policy():
2025-12-04T10:11:57.4579285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4579427Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4580248Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4580252Z 
2025-12-04T10:11:57.4580387Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4580913Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4580917Z 
2025-12-04T10:11:57.4581076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4581209Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4581309Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4581667Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4581795Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4581854Z graph_break []
2025-12-04T10:11:57.4582156Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4582233Z Traceback (most recent call last):
2025-12-04T10:11:57.4582540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4582607Z     method(*args, **kwargs)
2025-12-04T10:11:57.4582897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4582970Z     method(*args, **kwargs)
2025-12-04T10:11:57.4583260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4583320Z     with policy():
2025-12-04T10:11:57.4583622Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4583689Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4584534Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4584537Z 
2025-12-04T10:11:57.4584664Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4585194Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4585200Z 
2025-12-04T10:11:57.4585358Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4585488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4585599Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4585950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4586170Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4586232Z graph_break []
2025-12-04T10:11:57.4586358Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4586455Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4586648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4586995Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4587061Z graph_break []
2025-12-04T10:11:57.4587147Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4587449Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4587523Z Traceback (most recent call last):
2025-12-04T10:11:57.4587826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4587910Z     method(*args, **kwargs)
2025-12-04T10:11:57.4588206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4588274Z     method(*args, **kwargs)
2025-12-04T10:11:57.4588571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4588632Z     with policy():
2025-12-04T10:11:57.4588934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4589004Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4589838Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4589843Z 
2025-12-04T10:11:57.4589976Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4590501Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4590508Z 
2025-12-04T10:11:57.4590672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4590799Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4590890Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4591242Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4591371Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4591444Z graph_break []
2025-12-04T10:11:57.4591576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4591670Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4591803Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4592147Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4592216Z graph_break []
2025-12-04T10:11:57.4592343Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4592440Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4592569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4592987Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4593050Z graph_break []
2025-12-04T10:11:57.4593549Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml -
2025-12-04T10:11:57.4593717Z =========================== short test summary info ============================
2025-12-04T10:11:57.4595039Z FAILED [0.6189s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4595046Z 
2025-12-04T10:11:57.4595174Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4595709Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4595715Z 
2025-12-04T10:11:57.4595876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4595982Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4596118Z =================== 1 failed, 4 deselected, 2 rerun in 3.33s ===================
2025-12-04T10:11:57.4596181Z Got exit code 1
2025-12-04T10:11:57.4596254Z Retrying single test...
2025-12-04T10:11:57.4596519Z W1204 09:30:01.748000 36563 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4596911Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml
2025-12-04T10:11:57.4597015Z ============================= test session starts ==============================
2025-12-04T10:11:57.4597227Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4597300Z cachedir: .pytest_cache
2025-12-04T10:11:57.4597611Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4597688Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4597758Z configfile: pytest.ini
2025-12-04T10:11:57.4598076Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4598207Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4598790Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4598863Z Running 1 items in this shard
2025-12-04T10:11:57.4598867Z 
2025-12-04T10:11:57.4599614Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:03.182918143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4599618Z 
2025-12-04T10:11:57.4599962Z [W1204 09:30:12.473511348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4599966Z 
2025-12-04T10:11:57.4600270Z [W1204 09:30:12.473769112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4600274Z 
2025-12-04T10:11:57.4600640Z [W1204 09:30:12.479933278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4600643Z 
2025-12-04T10:11:57.4600939Z [W1204 09:30:12.480570129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4601082Z 
2025-12-04T10:11:57.4601374Z [W1204 09:30:12.480762912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4601377Z 
2025-12-04T10:11:57.4601664Z [W1204 09:30:12.486267086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4601668Z 
2025-12-04T10:11:57.4601961Z [W1204 09:30:12.486829705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4601965Z 
2025-12-04T10:11:57.4602257Z [W1204 09:30:12.487008978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4602260Z 
2025-12-04T10:11:57.4602348Z ('RERUN', {'yellow': True}) [11.3528s] [100%]
2025-12-04T10:11:57.4603095Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:13.665245296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4603102Z 
2025-12-04T10:11:57.4603402Z [W1204 09:30:13.665786476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4603405Z 
2025-12-04T10:11:57.4603696Z [W1204 09:30:13.665931688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4603699Z 
2025-12-04T10:11:57.4603996Z [W1204 09:30:13.668838368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4603999Z 
2025-12-04T10:11:57.4604293Z [W1204 09:30:13.669401907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4604299Z 
2025-12-04T10:11:57.4604589Z [W1204 09:30:13.669543160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4604597Z 
2025-12-04T10:11:57.4604884Z [W1204 09:30:13.674047227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4604887Z 
2025-12-04T10:11:57.4605174Z [W1204 09:30:13.674512786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4605177Z 
2025-12-04T10:11:57.4605474Z [W1204 09:30:13.674649388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4605477Z 
2025-12-04T10:11:57.4605558Z ('RERUN', {'yellow': True}) [0.5986s] [100%]
2025-12-04T10:11:57.4606295Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:14.257175233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4606301Z 
2025-12-04T10:11:57.4606591Z [W1204 09:30:14.257719193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4606594Z 
2025-12-04T10:11:57.4606887Z [W1204 09:30:14.257863675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4606890Z 
2025-12-04T10:11:57.4607247Z [W1204 09:30:14.260799495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4607251Z 
2025-12-04T10:11:57.4607548Z [W1204 09:30:14.261363075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4607623Z 
2025-12-04T10:11:57.4607918Z [W1204 09:30:14.261502938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4607922Z 
2025-12-04T10:11:57.4608211Z [W1204 09:30:14.265992665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4608220Z 
2025-12-04T10:11:57.4608510Z [W1204 09:30:14.266453993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4608513Z 
2025-12-04T10:11:57.4608806Z [W1204 09:30:14.266590625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4608809Z 
2025-12-04T10:11:57.4608880Z FAILED [0.5960s] [100%]
2025-12-04T10:11:57.4608884Z 
2025-12-04T10:11:57.4608969Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4609277Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4609356Z Traceback (most recent call last):
2025-12-04T10:11:57.4609664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4609740Z     method(*args, **kwargs)
2025-12-04T10:11:57.4610040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4610106Z     method(*args, **kwargs)
2025-12-04T10:11:57.4610410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4610472Z     with policy():
2025-12-04T10:11:57.4610774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4610844Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4611661Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4611670Z 
2025-12-04T10:11:57.4611800Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4612329Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4612333Z 
2025-12-04T10:11:57.4612499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4612628Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4612734Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4613094Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4613226Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4613294Z graph_break []
2025-12-04T10:11:57.4613420Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4614217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4614301Z   if out == self.unknown_value:
2025-12-04T10:11:57.4614601Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4614749Z Traceback (most recent call last):
2025-12-04T10:11:57.4615055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4615121Z     method(*args, **kwargs)
2025-12-04T10:11:57.4615424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4615490Z     method(*args, **kwargs)
2025-12-04T10:11:57.4615790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4615850Z     with policy():
2025-12-04T10:11:57.4616152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4616225Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4617191Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4617198Z 
2025-12-04T10:11:57.4617335Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4617864Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4617868Z 
2025-12-04T10:11:57.4618031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4618179Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4618277Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4618636Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4618765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4618825Z graph_break []
2025-12-04T10:11:57.4618957Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4619648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4619726Z   if out == self.unknown_value:
2025-12-04T10:11:57.4619856Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4619951Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4620086Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4620432Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4620493Z graph_break []
2025-12-04T10:11:57.4620586Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4620885Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4620968Z Traceback (most recent call last):
2025-12-04T10:11:57.4621265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4621438Z     method(*args, **kwargs)
2025-12-04T10:11:57.4621739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4621805Z     method(*args, **kwargs)
2025-12-04T10:11:57.4622095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4622252Z     with policy():
2025-12-04T10:11:57.4622548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4622622Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4623450Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4623456Z 
2025-12-04T10:11:57.4623587Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4624112Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4624118Z 
2025-12-04T10:11:57.4624280Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4624412Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4624502Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4624852Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4624979Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4625041Z graph_break []
2025-12-04T10:11:57.4625173Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4625865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4625941Z   if out == self.unknown_value:
2025-12-04T10:11:57.4626069Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4626161Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4626289Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4626630Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4626689Z graph_break []
2025-12-04T10:11:57.4626824Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4626914Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4627035Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4627381Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4627442Z graph_break []
2025-12-04T10:11:57.4627935Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml -
2025-12-04T10:11:57.4628037Z =========================== short test summary info ============================
2025-12-04T10:11:57.4629434Z FAILED [0.5960s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4629532Z 
2025-12-04T10:11:57.4629663Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4630193Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4630197Z 
2025-12-04T10:11:57.4630356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4630462Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4630584Z ================== 1 failed, 57 deselected, 2 rerun in 12.57s ==================
2025-12-04T10:11:57.4630643Z Got exit code 1
2025-12-04T10:11:57.4630709Z Retrying single test...
2025-12-04T10:11:57.4630975Z W1204 09:30:20.920000 36757 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4631367Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml
2025-12-04T10:11:57.4631469Z ============================= test session starts ==============================
2025-12-04T10:11:57.4631680Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4631752Z cachedir: .pytest_cache
2025-12-04T10:11:57.4632063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4632142Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4632209Z configfile: pytest.ini
2025-12-04T10:11:57.4632532Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4632661Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4633244Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4633316Z Running 1 items in this shard
2025-12-04T10:11:57.4633319Z 
2025-12-04T10:11:57.4634067Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:22.416789240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4634071Z 
2025-12-04T10:11:57.4634373Z [W1204 09:30:31.402910727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4634377Z 
2025-12-04T10:11:57.4634675Z [W1204 09:30:31.403153742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4634686Z 
2025-12-04T10:11:57.4634978Z [W1204 09:30:31.408909661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4634982Z 
2025-12-04T10:11:57.4635274Z [W1204 09:30:31.409459280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4635277Z 
2025-12-04T10:11:57.4635586Z [W1204 09:30:31.409627403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4635590Z 
2025-12-04T10:11:57.4635954Z [W1204 09:30:31.415148647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4635958Z 
2025-12-04T10:11:57.4636256Z [W1204 09:30:31.415691286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4636342Z 
2025-12-04T10:11:57.4636632Z [W1204 09:30:31.415870100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4636635Z 
2025-12-04T10:11:57.4636724Z ('RERUN', {'yellow': True}) [11.1094s] [100%]
2025-12-04T10:11:57.4637458Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:32.592067768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4637461Z 
2025-12-04T10:11:57.4637777Z [W1204 09:30:32.592610047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4637780Z 
2025-12-04T10:11:57.4638078Z [W1204 09:30:32.592775580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4638084Z 
2025-12-04T10:11:57.4638375Z [W1204 09:30:32.595692400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4638384Z 
2025-12-04T10:11:57.4638675Z [W1204 09:30:32.596252009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4638678Z 
2025-12-04T10:11:57.4650325Z [W1204 09:30:32.596410352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4650337Z 
2025-12-04T10:11:57.4650694Z [W1204 09:30:32.600974790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4650698Z 
2025-12-04T10:11:57.4651018Z [W1204 09:30:32.601433798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4651025Z 
2025-12-04T10:11:57.4651317Z [W1204 09:30:32.601574490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4651321Z 
2025-12-04T10:11:57.4651416Z ('RERUN', {'yellow': True}) [0.5973s] [100%]
2025-12-04T10:11:57.4652186Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:30:33.188362203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4652190Z 
2025-12-04T10:11:57.4652505Z [W1204 09:30:33.188898852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4652512Z 
2025-12-04T10:11:57.4652804Z [W1204 09:30:33.189043585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4652810Z 
2025-12-04T10:11:57.4653096Z [W1204 09:30:33.191974845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4653100Z 
2025-12-04T10:11:57.4653393Z [W1204 09:30:33.192539584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4653396Z 
2025-12-04T10:11:57.4653681Z [W1204 09:30:33.192676867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4653684Z 
2025-12-04T10:11:57.4654095Z [W1204 09:30:33.197160983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4654099Z 
2025-12-04T10:11:57.4654395Z [W1204 09:30:33.197614671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4654464Z 
2025-12-04T10:11:57.4654759Z [W1204 09:30:33.197750633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4654762Z 
2025-12-04T10:11:57.4654827Z FAILED [0.5972s] [100%]
2025-12-04T10:11:57.4654830Z 
2025-12-04T10:11:57.4654927Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4655234Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4655311Z Traceback (most recent call last):
2025-12-04T10:11:57.4655637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4655708Z     method(*args, **kwargs)
2025-12-04T10:11:57.4656008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4656074Z     method(*args, **kwargs)
2025-12-04T10:11:57.4656365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4656430Z     with policy():
2025-12-04T10:11:57.4656729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4656796Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4657635Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4657642Z 
2025-12-04T10:11:57.4657784Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4658326Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4658332Z 
2025-12-04T10:11:57.4658496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4658635Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4658739Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4659100Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4659244Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4659308Z graph_break []
2025-12-04T10:11:57.4659441Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4660149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4660224Z   if out == self.unknown_value:
2025-12-04T10:11:57.4660537Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4660612Z Traceback (most recent call last):
2025-12-04T10:11:57.4660918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4660990Z     method(*args, **kwargs)
2025-12-04T10:11:57.4661351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4661415Z     method(*args, **kwargs)
2025-12-04T10:11:57.4661704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4662133Z     with policy():
2025-12-04T10:11:57.4662444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4662511Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4663350Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4663357Z 
2025-12-04T10:11:57.4663488Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4664016Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4664022Z 
2025-12-04T10:11:57.4664188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4664319Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4664419Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4664771Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4664899Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4664962Z graph_break []
2025-12-04T10:11:57.4665087Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4665783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4665861Z   if out == self.unknown_value:
2025-12-04T10:11:57.4665985Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4666079Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4666202Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4666543Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4666608Z graph_break []
2025-12-04T10:11:57.4666690Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4666995Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4667067Z Traceback (most recent call last):
2025-12-04T10:11:57.4667377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4667446Z     method(*args, **kwargs)
2025-12-04T10:11:57.4667736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4667798Z     method(*args, **kwargs)
2025-12-04T10:11:57.4668095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4668155Z     with policy():
2025-12-04T10:11:57.4668447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4668513Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4669417Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4669487Z 
2025-12-04T10:11:57.4669624Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4670148Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4670152Z 
2025-12-04T10:11:57.4670316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4670440Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4670533Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4670882Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4671005Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4671068Z graph_break []
2025-12-04T10:11:57.4671200Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4671894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4671966Z   if out == self.unknown_value:
2025-12-04T10:11:57.4672087Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4672183Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4672305Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4672645Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4672711Z graph_break []
2025-12-04T10:11:57.4672831Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4672917Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4673039Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4673446Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4673556Z graph_break []
2025-12-04T10:11:57.4674150Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml -
2025-12-04T10:11:57.4674253Z =========================== short test summary info ============================
2025-12-04T10:11:57.4675574Z FAILED [0.5972s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4675583Z 
2025-12-04T10:11:57.4675710Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4676325Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4676330Z 
2025-12-04T10:11:57.4676498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4676610Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4676821Z ================== 1 failed, 57 deselected, 2 rerun in 12.33s ==================
2025-12-04T10:11:57.4676886Z Got exit code 1
2025-12-04T10:11:57.4677375Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4677624Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.4677891Z W1204 09:30:39.873000 36951 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4678282Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml
2025-12-04T10:11:57.4678393Z ============================= test session starts ==============================
2025-12-04T10:11:57.4678606Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4678681Z cachedir: .pytest_cache
2025-12-04T10:11:57.4678990Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4679068Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4679138Z configfile: pytest.ini
2025-12-04T10:11:57.4679453Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4679582Z collecting ... collected 58 items / 5 deselected / 53 selected
2025-12-04T10:11:57.4679676Z stepcurrent: skipping 5 already run items.
2025-12-04T10:11:57.4679747Z Running 53 items in this shard
2025-12-04T10:11:57.4679750Z 
2025-12-04T10:11:57.4680350Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9428s] [  1%]
2025-12-04T10:11:57.4680845Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5280s] [  1%]
2025-12-04T10:11:57.4681287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.5350s] [  1%]
2025-12-04T10:11:57.4681296Z 
2025-12-04T10:11:57.4681381Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4681679Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4681755Z Traceback (most recent call last):
2025-12-04T10:11:57.4682064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4682133Z     method(*args, **kwargs)
2025-12-04T10:11:57.4682435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4682499Z     method(*args, **kwargs)
2025-12-04T10:11:57.4682794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4682853Z     with policy():
2025-12-04T10:11:57.4683151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4683222Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4684110Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4684182Z 
2025-12-04T10:11:57.4684319Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4684843Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4684847Z 
2025-12-04T10:11:57.4685007Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4685139Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4685235Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4685789Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4685929Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4685994Z graph_break []
2025-12-04T10:11:57.4686288Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4686364Z Traceback (most recent call last):
2025-12-04T10:11:57.4686669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4686731Z     method(*args, **kwargs)
2025-12-04T10:11:57.4687022Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4687088Z     method(*args, **kwargs)
2025-12-04T10:11:57.4687383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4687443Z     with policy():
2025-12-04T10:11:57.4687742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4687819Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4688648Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4688652Z 
2025-12-04T10:11:57.4688781Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4689305Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4689309Z 
2025-12-04T10:11:57.4689465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4689591Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4689690Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4690233Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4690361Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4690419Z graph_break []
2025-12-04T10:11:57.4690541Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4690636Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4690831Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4691374Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4691519Z graph_break []
2025-12-04T10:11:57.4691601Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4691893Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4691966Z Traceback (most recent call last):
2025-12-04T10:11:57.4692267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4692333Z     method(*args, **kwargs)
2025-12-04T10:11:57.4692624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4692690Z     method(*args, **kwargs)
2025-12-04T10:11:57.4692977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4693038Z     with policy():
2025-12-04T10:11:57.4693337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4693400Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4694217Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4694224Z 
2025-12-04T10:11:57.4694350Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4694872Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4694877Z 
2025-12-04T10:11:57.4695040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4695164Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4695256Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4695793Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4695917Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4695979Z graph_break []
2025-12-04T10:11:57.4696102Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4696189Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4696314Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4696855Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4696917Z graph_break []
2025-12-04T10:11:57.4697041Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4697131Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4697257Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4697864Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4697927Z graph_break []
2025-12-04T10:11:57.4698425Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml -
2025-12-04T10:11:57.4698592Z =========================== short test summary info ============================
2025-12-04T10:11:57.4699884Z FAILED [0.5350s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4699892Z 
2025-12-04T10:11:57.4700017Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4700545Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4700551Z 
2025-12-04T10:11:57.4700708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4700818Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4700936Z =================== 1 failed, 5 deselected, 2 rerun in 3.03s ===================
2025-12-04T10:11:57.4700994Z Got exit code 1
2025-12-04T10:11:57.4701064Z Retrying single test...
2025-12-04T10:11:57.4701327Z W1204 09:30:49.535000 37140 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4701720Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml
2025-12-04T10:11:57.4701814Z ============================= test session starts ==============================
2025-12-04T10:11:57.4702027Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4702102Z cachedir: .pytest_cache
2025-12-04T10:11:57.4702409Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4702495Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4702566Z configfile: pytest.ini
2025-12-04T10:11:57.4702884Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4703018Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4703592Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4703663Z Running 1 items in this shard
2025-12-04T10:11:57.4703666Z 
2025-12-04T10:11:57.4704405Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:30:51.133610020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4704409Z 
2025-12-04T10:11:57.4704706Z [W1204 09:31:00.111308444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4704710Z 
2025-12-04T10:11:57.4705004Z [W1204 09:31:00.111578758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4705007Z 
2025-12-04T10:11:57.4705385Z [W1204 09:31:00.117829945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4705389Z 
2025-12-04T10:11:57.4705682Z [W1204 09:31:00.118433966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4705751Z 
2025-12-04T10:11:57.4706041Z [W1204 09:31:00.118617479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4706045Z 
2025-12-04T10:11:57.4706340Z [W1204 09:31:00.124084572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4706343Z 
2025-12-04T10:11:57.4706631Z [W1204 09:31:00.124640912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4706635Z 
2025-12-04T10:11:57.4706929Z [W1204 09:31:00.124797894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4706932Z 
2025-12-04T10:11:57.4707014Z ('RERUN', {'yellow': True}) [10.9282s] [100%]
2025-12-04T10:11:57.4707739Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:00.924744509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4707746Z 
2025-12-04T10:11:57.4708041Z [W1204 09:31:00.925259168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4708044Z 
2025-12-04T10:11:57.4708330Z [W1204 09:31:00.925399560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4708333Z 
2025-12-04T10:11:57.4708636Z [W1204 09:31:00.928243399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4708639Z 
2025-12-04T10:11:57.4708930Z [W1204 09:31:00.928703027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4708935Z 
2025-12-04T10:11:57.4709227Z [W1204 09:31:00.928841009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4709230Z 
2025-12-04T10:11:57.4709515Z [W1204 09:31:01.933330455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4709519Z 
2025-12-04T10:11:57.4709809Z [W1204 09:31:01.933792854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4709813Z 
2025-12-04T10:11:57.4710106Z [W1204 09:31:01.933928416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4710109Z 
2025-12-04T10:11:57.4710189Z ('RERUN', {'yellow': True}) [0.4998s] [100%]
2025-12-04T10:11:57.4710925Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:01.423124474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4710931Z 
2025-12-04T10:11:57.4711226Z [W1204 09:31:01.423626882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4711230Z 
2025-12-04T10:11:57.4711522Z [W1204 09:31:01.423766285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4711525Z 
2025-12-04T10:11:57.4711883Z [W1204 09:31:01.426642214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4711886Z 
2025-12-04T10:11:57.4712179Z [W1204 09:31:01.427083721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4712246Z 
2025-12-04T10:11:57.4712538Z [W1204 09:31:01.427221134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4712541Z 
2025-12-04T10:11:57.4712831Z [W1204 09:31:01.431794032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4712834Z 
2025-12-04T10:11:57.4713124Z [W1204 09:31:01.432248700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4713127Z 
2025-12-04T10:11:57.4713422Z [W1204 09:31:01.432396892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4713425Z 
2025-12-04T10:11:57.4713486Z FAILED [0.4965s] [100%]
2025-12-04T10:11:57.4713489Z 
2025-12-04T10:11:57.4713572Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4713873Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4713947Z Traceback (most recent call last):
2025-12-04T10:11:57.4714260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4714329Z     method(*args, **kwargs)
2025-12-04T10:11:57.4714623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4714692Z     method(*args, **kwargs)
2025-12-04T10:11:57.4715001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4715062Z     with policy():
2025-12-04T10:11:57.4715362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4715430Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4716235Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4716239Z 
2025-12-04T10:11:57.4716366Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4716894Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4716902Z 
2025-12-04T10:11:57.4717269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4717399Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4717500Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4718049Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4718184Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4718250Z graph_break []
2025-12-04T10:11:57.4718376Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4719209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4719284Z   if out == self.unknown_value:
2025-12-04T10:11:57.4719577Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4719749Z Traceback (most recent call last):
2025-12-04T10:11:57.4720088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4720159Z     method(*args, **kwargs)
2025-12-04T10:11:57.4720449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4720510Z     method(*args, **kwargs)
2025-12-04T10:11:57.4720816Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4720880Z     with policy():
2025-12-04T10:11:57.4721173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4721242Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4722063Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4722070Z 
2025-12-04T10:11:57.4722200Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4722719Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4722723Z 
2025-12-04T10:11:57.4722893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4723021Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4723114Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4723664Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4723789Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4723853Z graph_break []
2025-12-04T10:11:57.4723973Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4724661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4724737Z   if out == self.unknown_value:
2025-12-04T10:11:57.4724855Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4724948Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4725072Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4725606Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4725669Z graph_break []
2025-12-04T10:11:57.4725761Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4726055Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4726209Z Traceback (most recent call last):
2025-12-04T10:11:57.4726509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4726576Z     method(*args, **kwargs)
2025-12-04T10:11:57.4726869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4727074Z     method(*args, **kwargs)
2025-12-04T10:11:57.4727366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4727424Z     with policy():
2025-12-04T10:11:57.4727721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4727788Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4728620Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4728624Z 
2025-12-04T10:11:57.4728754Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4729273Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4729276Z 
2025-12-04T10:11:57.4729438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4729560Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4729648Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4730197Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4730320Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4730381Z graph_break []
2025-12-04T10:11:57.4730506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4731196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4731270Z   if out == self.unknown_value:
2025-12-04T10:11:57.4731400Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4731498Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4731622Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4732160Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4732224Z graph_break []
2025-12-04T10:11:57.4732344Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4732434Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4732559Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4733097Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4733159Z graph_break []
2025-12-04T10:11:57.4733744Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml -
2025-12-04T10:11:57.4733847Z =========================== short test summary info ============================
2025-12-04T10:11:57.4735145Z FAILED [0.4965s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4735213Z 
2025-12-04T10:11:57.4735339Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4735863Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4735866Z 
2025-12-04T10:11:57.4736023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4736141Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4736261Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.4736319Z Got exit code 1
2025-12-04T10:11:57.4736385Z Retrying single test...
2025-12-04T10:11:57.4736652Z W1204 09:31:07.982000 37334 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4737043Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml
2025-12-04T10:11:57.4737139Z ============================= test session starts ==============================
2025-12-04T10:11:57.4737353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4737425Z cachedir: .pytest_cache
2025-12-04T10:11:57.4737741Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4737828Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4737892Z configfile: pytest.ini
2025-12-04T10:11:57.4738204Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4738336Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4738908Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4738979Z Running 1 items in this shard
2025-12-04T10:11:57.4738985Z 
2025-12-04T10:11:57.4739721Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:09.558200458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4739726Z 
2025-12-04T10:11:57.4740025Z [W1204 09:31:18.657581822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4740033Z 
2025-12-04T10:11:57.4740322Z [W1204 09:31:18.657839777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4740326Z 
2025-12-04T10:11:57.4740615Z [W1204 09:31:18.663645796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4740619Z 
2025-12-04T10:11:57.4741021Z [W1204 09:31:18.664205235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4741025Z 
2025-12-04T10:11:57.4741315Z [W1204 09:31:18.664387829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4741385Z 
2025-12-04T10:11:57.4741686Z [W1204 09:31:18.669735400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4741688Z 
2025-12-04T10:11:57.4741979Z [W1204 09:31:18.670276819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4741986Z 
2025-12-04T10:11:57.4742276Z [W1204 09:31:18.670439282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4742280Z 
2025-12-04T10:11:57.4742373Z ('RERUN', {'yellow': True}) [11.0287s] [100%]
2025-12-04T10:11:57.4743112Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:19.465494855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4743118Z 
2025-12-04T10:11:57.4743411Z [W1204 09:31:19.466006143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4743414Z 
2025-12-04T10:11:57.4743708Z [W1204 09:31:19.466148626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4743711Z 
2025-12-04T10:11:57.4744003Z [W1204 09:31:19.468989754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4744006Z 
2025-12-04T10:11:57.4744305Z [W1204 09:31:19.469440682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4744308Z 
2025-12-04T10:11:57.4744598Z [W1204 09:31:19.469580324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4744604Z 
2025-12-04T10:11:57.4744892Z [W1204 09:31:19.473995540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4744902Z 
2025-12-04T10:11:57.4745196Z [W1204 09:31:19.474445718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4745199Z 
2025-12-04T10:11:57.4745489Z [W1204 09:31:19.474584110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4745492Z 
2025-12-04T10:11:57.4745579Z ('RERUN', {'yellow': True}) [0.4925s] [100%]
2025-12-04T10:11:57.4746309Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:31:20.956503298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4746315Z 
2025-12-04T10:11:57.4746613Z [W1204 09:31:20.957010497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4746617Z 
2025-12-04T10:11:57.4746910Z [W1204 09:31:20.957153199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4746914Z 
2025-12-04T10:11:57.4747209Z [W1204 09:31:20.959995088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4747212Z 
2025-12-04T10:11:57.4747566Z [W1204 09:31:20.960471716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4747570Z 
2025-12-04T10:11:57.4747863Z [W1204 09:31:20.960619338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4747931Z 
2025-12-04T10:11:57.4748221Z [W1204 09:31:20.964999903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4748224Z 
2025-12-04T10:11:57.4748511Z [W1204 09:31:20.965450101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4748519Z 
2025-12-04T10:11:57.4748806Z [W1204 09:31:20.965585543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4748809Z 
2025-12-04T10:11:57.4748870Z FAILED [0.4865s] [100%]
2025-12-04T10:11:57.4748874Z 
2025-12-04T10:11:57.4748962Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4749258Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4749337Z Traceback (most recent call last):
2025-12-04T10:11:57.4749647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4749711Z     method(*args, **kwargs)
2025-12-04T10:11:57.4750018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4750080Z     method(*args, **kwargs)
2025-12-04T10:11:57.4750385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4750456Z     with policy():
2025-12-04T10:11:57.4750761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4750837Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4751646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4751653Z 
2025-12-04T10:11:57.4751784Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4752308Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4752312Z 
2025-12-04T10:11:57.4752469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4752606Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4752702Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4753247Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4753384Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4753442Z graph_break []
2025-12-04T10:11:57.4753572Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4754264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4754336Z   if out == self.unknown_value:
2025-12-04T10:11:57.4754705Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4754782Z Traceback (most recent call last):
2025-12-04T10:11:57.4755086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4755242Z     method(*args, **kwargs)
2025-12-04T10:11:57.4755538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4755606Z     method(*args, **kwargs)
2025-12-04T10:11:57.4755902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4755962Z     with policy():
2025-12-04T10:11:57.4756261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4756327Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4757155Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4757162Z 
2025-12-04T10:11:57.4757287Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4757815Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4757819Z 
2025-12-04T10:11:57.4757974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4758103Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4758204Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4758745Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4758881Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4758941Z graph_break []
2025-12-04T10:11:57.4759063Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4759758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4759830Z   if out == self.unknown_value:
2025-12-04T10:11:57.4760006Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4760100Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4760225Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4760774Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4760836Z graph_break []
2025-12-04T10:11:57.4760930Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4761229Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4761301Z Traceback (most recent call last):
2025-12-04T10:11:57.4761603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4761671Z     method(*args, **kwargs)
2025-12-04T10:11:57.4762036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4762111Z     method(*args, **kwargs)
2025-12-04T10:11:57.4762404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4762536Z     with policy():
2025-12-04T10:11:57.4762838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4762906Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4763734Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4763737Z 
2025-12-04T10:11:57.4763867Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4764394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4764400Z 
2025-12-04T10:11:57.4764559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4764682Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4764781Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4765320Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4765453Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4765514Z graph_break []
2025-12-04T10:11:57.4765637Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4766334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4766406Z   if out == self.unknown_value:
2025-12-04T10:11:57.4766533Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4766622Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4766749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4767298Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4767357Z graph_break []
2025-12-04T10:11:57.4767485Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4767588Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4767731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4768277Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4768335Z graph_break []
2025-12-04T10:11:57.4768827Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml -
2025-12-04T10:11:57.4768939Z =========================== short test summary info ============================
2025-12-04T10:11:57.4770306Z FAILED [0.4865s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4770375Z 
2025-12-04T10:11:57.4770516Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4771043Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4771046Z 
2025-12-04T10:11:57.4771212Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4771322Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4771438Z ================== 1 failed, 57 deselected, 2 rerun in 12.03s ==================
2025-12-04T10:11:57.4771501Z Got exit code 1
2025-12-04T10:11:57.4771984Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4772253Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.4772524Z W1204 09:31:26.607000 37527 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4772915Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml
2025-12-04T10:11:57.4773019Z ============================= test session starts ==============================
2025-12-04T10:11:57.4773233Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4773307Z cachedir: .pytest_cache
2025-12-04T10:11:57.4773612Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4773694Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4773765Z configfile: pytest.ini
2025-12-04T10:11:57.4774085Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4774214Z collecting ... collected 58 items / 6 deselected / 52 selected
2025-12-04T10:11:57.4774305Z stepcurrent: skipping 6 already run items.
2025-12-04T10:11:57.4774378Z Running 52 items in this shard
2025-12-04T10:11:57.4774382Z 
2025-12-04T10:11:57.4774889Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9379s] [  1%]
2025-12-04T10:11:57.4775385Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5367s] [  1%]
2025-12-04T10:11:57.4775844Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5346s] [  1%]
2025-12-04T10:11:57.4775848Z 
2025-12-04T10:11:57.4775930Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4776227Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4776308Z Traceback (most recent call last):
2025-12-04T10:11:57.4776687Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4776760Z     method(*args, **kwargs)
2025-12-04T10:11:57.4777052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4777180Z     method(*args, **kwargs)
2025-12-04T10:11:57.4777476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4777537Z     with policy():
2025-12-04T10:11:57.4777834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4777904Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4778716Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4778720Z 
2025-12-04T10:11:57.4778858Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4779380Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4779386Z 
2025-12-04T10:11:57.4779552Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4779677Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4779774Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4780331Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4780463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4780527Z graph_break []
2025-12-04T10:11:57.4780817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4780893Z Traceback (most recent call last):
2025-12-04T10:11:57.4781203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4781272Z     method(*args, **kwargs)
2025-12-04T10:11:57.4781566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4781636Z     method(*args, **kwargs)
2025-12-04T10:11:57.4781926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4781992Z     with policy():
2025-12-04T10:11:57.4782290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4782357Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4783186Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4783192Z 
2025-12-04T10:11:57.4783316Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4783840Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4783844Z 
2025-12-04T10:11:57.4784069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4784199Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4784297Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4784853Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4785070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4785130Z graph_break []
2025-12-04T10:11:57.4785256Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4785351Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4785473Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4786016Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4786076Z graph_break []
2025-12-04T10:11:57.4786161Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4786461Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4786536Z Traceback (most recent call last):
2025-12-04T10:11:57.4786838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4786907Z     method(*args, **kwargs)
2025-12-04T10:11:57.4787200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4787278Z     method(*args, **kwargs)
2025-12-04T10:11:57.4787575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4787637Z     with policy():
2025-12-04T10:11:57.4787938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4788012Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4788838Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4788842Z 
2025-12-04T10:11:57.4788967Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4789491Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4789500Z 
2025-12-04T10:11:57.4789656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4789779Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4789878Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4790421Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4790550Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4790619Z graph_break []
2025-12-04T10:11:57.4790744Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4790850Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4791048Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4791587Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4791717Z graph_break []
2025-12-04T10:11:57.4791840Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4791937Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4792061Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4792596Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4792659Z graph_break []
2025-12-04T10:11:57.4793148Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml -
2025-12-04T10:11:57.4793253Z =========================== short test summary info ============================
2025-12-04T10:11:57.4794549Z FAILED [0.5346s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4794553Z 
2025-12-04T10:11:57.4794682Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4795207Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4795210Z 
2025-12-04T10:11:57.4795363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4795473Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4795587Z =================== 1 failed, 6 deselected, 2 rerun in 3.03s ===================
2025-12-04T10:11:57.4795653Z Got exit code 1
2025-12-04T10:11:57.4795717Z Retrying single test...
2025-12-04T10:11:57.4795983Z W1204 09:31:36.178000 37716 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4796371Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml
2025-12-04T10:11:57.4796471Z ============================= test session starts ==============================
2025-12-04T10:11:57.4796681Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4796756Z cachedir: .pytest_cache
2025-12-04T10:11:57.4797064Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4797150Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4797218Z configfile: pytest.ini
2025-12-04T10:11:57.4797547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4797688Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4798261Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4798406Z Running 1 items in this shard
2025-12-04T10:11:57.4798415Z 
2025-12-04T10:11:57.4799150Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:37.767041253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4799219Z 
2025-12-04T10:11:57.4799524Z [W1204 09:31:47.962273885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4799533Z 
2025-12-04T10:11:57.4799827Z [W1204 09:31:47.962529549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4799831Z 
2025-12-04T10:11:57.4800213Z [W1204 09:31:47.968336088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4800216Z 
2025-12-04T10:11:57.4800515Z [W1204 09:31:47.968929309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4800519Z 
2025-12-04T10:11:57.4800811Z [W1204 09:31:47.969115432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4800816Z 
2025-12-04T10:11:57.4801114Z [W1204 09:31:47.974502114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4801117Z 
2025-12-04T10:11:57.4801406Z [W1204 09:31:47.975044133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4801410Z 
2025-12-04T10:11:57.4801704Z [W1204 09:31:47.975203636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4801708Z 
2025-12-04T10:11:57.4801794Z ('RERUN', {'yellow': True}) [11.1389s] [100%]
2025-12-04T10:11:57.4802526Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:47.777956342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4802538Z 
2025-12-04T10:11:57.4802830Z [W1204 09:31:47.778482331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4802834Z 
2025-12-04T10:11:57.4803124Z [W1204 09:31:47.778620293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4803127Z 
2025-12-04T10:11:57.4803420Z [W1204 09:31:47.781593514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4803423Z 
2025-12-04T10:11:57.4803715Z [W1204 09:31:47.782054621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4803718Z 
2025-12-04T10:11:57.4804015Z [W1204 09:31:47.782191474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4804021Z 
2025-12-04T10:11:57.4804320Z [W1204 09:31:47.786757072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4804325Z 
2025-12-04T10:11:57.4804623Z [W1204 09:31:47.787213480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4804626Z 
2025-12-04T10:11:57.4804915Z [W1204 09:31:47.787349512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4804919Z 
2025-12-04T10:11:57.4805079Z ('RERUN', {'yellow': True}) [0.5056s] [100%]
2025-12-04T10:11:57.4805806Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:48.296193198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4805873Z 
2025-12-04T10:11:57.4806165Z [W1204 09:31:48.296718747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4806175Z 
2025-12-04T10:11:57.4806468Z [W1204 09:31:48.296860140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4806471Z 
2025-12-04T10:11:57.4806761Z [W1204 09:31:48.299783579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4806765Z 
2025-12-04T10:11:57.4807062Z [W1204 09:31:48.300255838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4807065Z 
2025-12-04T10:11:57.4807352Z [W1204 09:31:48.300408670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4807358Z 
2025-12-04T10:11:57.4807653Z [W1204 09:31:48.304935187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4807656Z 
2025-12-04T10:11:57.4807945Z [W1204 09:31:48.305384995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4807948Z 
2025-12-04T10:11:57.4808243Z [W1204 09:31:48.305521017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4808246Z 
2025-12-04T10:11:57.4808311Z FAILED [0.5075s] [100%]
2025-12-04T10:11:57.4808314Z 
2025-12-04T10:11:57.4808403Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4808706Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4808786Z Traceback (most recent call last):
2025-12-04T10:11:57.4809102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4809167Z     method(*args, **kwargs)
2025-12-04T10:11:57.4809464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4809536Z     method(*args, **kwargs)
2025-12-04T10:11:57.4809829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4809899Z     with policy():
2025-12-04T10:11:57.4810200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4810267Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4811086Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4811093Z 
2025-12-04T10:11:57.4811223Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4811752Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4811755Z 
2025-12-04T10:11:57.4811918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4812133Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4812240Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4812783Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4812986Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4813048Z graph_break []
2025-12-04T10:11:57.4813174Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4813880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4813955Z   if out == self.unknown_value:
2025-12-04T10:11:57.4814255Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4814328Z Traceback (most recent call last):
2025-12-04T10:11:57.4814631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4814706Z     method(*args, **kwargs)
2025-12-04T10:11:57.4814999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4815061Z     method(*args, **kwargs)
2025-12-04T10:11:57.4815359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4815421Z     with policy():
2025-12-04T10:11:57.4815723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4815796Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4816617Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4816629Z 
2025-12-04T10:11:57.4816757Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4817459Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4817462Z 
2025-12-04T10:11:57.4817628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4817757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4817863Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4818407Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4818538Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4818604Z graph_break []
2025-12-04T10:11:57.4818725Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4819414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4819492Z   if out == self.unknown_value:
2025-12-04T10:11:57.4819728Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4819828Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4819952Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4820501Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4820660Z graph_break []
2025-12-04T10:11:57.4820744Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4821038Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4821110Z Traceback (most recent call last):
2025-12-04T10:11:57.4821413Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4821484Z     method(*args, **kwargs)
2025-12-04T10:11:57.4821784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4821847Z     method(*args, **kwargs)
2025-12-04T10:11:57.4822147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4822208Z     with policy():
2025-12-04T10:11:57.4822508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4822588Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4823419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4823430Z 
2025-12-04T10:11:57.4823556Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4824075Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4824081Z 
2025-12-04T10:11:57.4824243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4824366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4824462Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4825003Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4825129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4825194Z graph_break []
2025-12-04T10:11:57.4825319Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4826016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4826090Z   if out == self.unknown_value:
2025-12-04T10:11:57.4826212Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4826309Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4826431Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4827045Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4827110Z graph_break []
2025-12-04T10:11:57.4827231Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4827325Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4827512Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4828051Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4828117Z graph_break []
2025-12-04T10:11:57.4828604Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml -
2025-12-04T10:11:57.4828712Z =========================== short test summary info ============================
2025-12-04T10:11:57.4830008Z FAILED [0.5075s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4830015Z 
2025-12-04T10:11:57.4830144Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4830663Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4830667Z 
2025-12-04T10:11:57.4830825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4830937Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4831053Z ================== 1 failed, 57 deselected, 2 rerun in 12.18s ==================
2025-12-04T10:11:57.4831123Z Got exit code 1
2025-12-04T10:11:57.4831187Z Retrying single test...
2025-12-04T10:11:57.4831450Z W1204 09:31:54.909000 37910 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4831841Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml
2025-12-04T10:11:57.4831936Z ============================= test session starts ==============================
2025-12-04T10:11:57.4832153Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4832221Z cachedir: .pytest_cache
2025-12-04T10:11:57.4832530Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4832613Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4832678Z configfile: pytest.ini
2025-12-04T10:11:57.4832996Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4833133Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4833703Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4833786Z Running 1 items in this shard
2025-12-04T10:11:57.4833790Z 
2025-12-04T10:11:57.4834598Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:31:56.509788887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4834603Z 
2025-12-04T10:11:57.4834909Z [W1204 09:32:05.750867435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4834977Z 
2025-12-04T10:11:57.4835272Z [W1204 09:32:05.751128189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4835276Z 
2025-12-04T10:11:57.4835566Z [W1204 09:32:05.756947319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4835575Z 
2025-12-04T10:11:57.4835865Z [W1204 09:32:05.757551830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4835869Z 
2025-12-04T10:11:57.4836158Z [W1204 09:32:05.757734133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4836161Z 
2025-12-04T10:11:57.4836461Z [W1204 09:32:05.763097335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4836467Z 
2025-12-04T10:11:57.4836757Z [W1204 09:32:05.763637634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4836760Z 
2025-12-04T10:11:57.4837055Z [W1204 09:32:05.763800517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4837058Z 
2025-12-04T10:11:57.4837141Z ('RERUN', {'yellow': True}) [11.1941s] [100%]
2025-12-04T10:11:57.4837883Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:32:06.566384240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4837888Z 
2025-12-04T10:11:57.4838183Z [W1204 09:32:06.566880988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4838189Z 
2025-12-04T10:11:57.4838487Z [W1204 09:32:06.567020251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4838490Z 
2025-12-04T10:11:57.4838781Z [W1204 09:32:06.569899610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4838784Z 
2025-12-04T10:11:57.4839074Z [W1204 09:32:06.570385918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4839077Z 
2025-12-04T10:11:57.4839372Z [W1204 09:32:06.570527940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4839375Z 
2025-12-04T10:11:57.4839664Z [W1204 09:32:06.574949046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4839671Z 
2025-12-04T10:11:57.4840008Z [W1204 09:32:06.575397154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4840011Z 
2025-12-04T10:11:57.4840298Z [W1204 09:32:06.575531876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4840301Z 
2025-12-04T10:11:57.4840384Z ('RERUN', {'yellow': True}) [0.5008s] [100%]
2025-12-04T10:11:57.4841275Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:32:07.066086491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4841279Z 
2025-12-04T10:11:57.4841576Z [W1204 09:32:07.066592690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4841644Z 
2025-12-04T10:11:57.4841931Z [W1204 09:32:07.066731342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4841934Z 
2025-12-04T10:11:57.4842220Z [W1204 09:32:07.069597131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4842227Z 
2025-12-04T10:11:57.4842519Z [W1204 09:32:07.070060799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4842523Z 
2025-12-04T10:11:57.4842813Z [W1204 09:32:07.070202162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4842816Z 
2025-12-04T10:11:57.4843121Z [W1204 09:32:07.074655638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4843128Z 
2025-12-04T10:11:57.4843419Z [W1204 09:32:07.075105216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4843422Z 
2025-12-04T10:11:57.4843711Z [W1204 09:32:07.075242259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4843715Z 
2025-12-04T10:11:57.4843776Z FAILED [0.5016s] [100%]
2025-12-04T10:11:57.4843780Z 
2025-12-04T10:11:57.4843863Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4844158Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4844231Z Traceback (most recent call last):
2025-12-04T10:11:57.4844540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4844603Z     method(*args, **kwargs)
2025-12-04T10:11:57.4844900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4844966Z     method(*args, **kwargs)
2025-12-04T10:11:57.4845255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4845318Z     with policy():
2025-12-04T10:11:57.4845609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4845675Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4846485Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4846492Z 
2025-12-04T10:11:57.4846618Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4847142Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4847146Z 
2025-12-04T10:11:57.4847304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4847436Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4847538Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4848158Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4848291Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4848414Z graph_break []
2025-12-04T10:11:57.4848538Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4849230Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4849299Z   if out == self.unknown_value:
2025-12-04T10:11:57.4849592Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4849664Z Traceback (most recent call last):
2025-12-04T10:11:57.4849963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4850031Z     method(*args, **kwargs)
2025-12-04T10:11:57.4850323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4850392Z     method(*args, **kwargs)
2025-12-04T10:11:57.4850683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4850742Z     with policy():
2025-12-04T10:11:57.4851039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4851105Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4851925Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4851933Z 
2025-12-04T10:11:57.4852059Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4852577Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4852581Z 
2025-12-04T10:11:57.4852741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4852865Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4852963Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4853509Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4853634Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4853697Z graph_break []
2025-12-04T10:11:57.4853820Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4854517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4854586Z   if out == self.unknown_value:
2025-12-04T10:11:57.4854710Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4854817Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4854941Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4855561Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4855623Z graph_break []
2025-12-04T10:11:57.4855772Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4856065Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.4856138Z Traceback (most recent call last):
2025-12-04T10:11:57.4856436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4856505Z     method(*args, **kwargs)
2025-12-04T10:11:57.4856799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4856866Z     method(*args, **kwargs)
2025-12-04T10:11:57.4857159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4857220Z     with policy():
2025-12-04T10:11:57.4857520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4857589Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4858419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4858423Z 
2025-12-04T10:11:57.4858548Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4859063Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4859067Z 
2025-12-04T10:11:57.4859229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4859353Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4859449Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4859993Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4860118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4860180Z graph_break []
2025-12-04T10:11:57.4860303Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4860994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4861062Z   if out == self.unknown_value:
2025-12-04T10:11:57.4861185Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4861280Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4861401Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4861945Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4862005Z graph_break []
2025-12-04T10:11:57.4862129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4862292Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4862414Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4862947Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4863096Z graph_break []
2025-12-04T10:11:57.4863584Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml -
2025-12-04T10:11:57.4863691Z =========================== short test summary info ============================
2025-12-04T10:11:57.4864988Z FAILED [0.5016s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4864995Z 
2025-12-04T10:11:57.4865122Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4865642Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4865646Z 
2025-12-04T10:11:57.4865806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4865908Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4866028Z ================== 1 failed, 57 deselected, 2 rerun in 12.22s ==================
2025-12-04T10:11:57.4866093Z Got exit code 1
2025-12-04T10:11:57.4866564Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.4866813Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.4867080Z W1204 09:32:13.708000 38104 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4867467Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml
2025-12-04T10:11:57.4867565Z ============================= test session starts ==============================
2025-12-04T10:11:57.4867774Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4867840Z cachedir: .pytest_cache
2025-12-04T10:11:57.4868148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4868224Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4868291Z configfile: pytest.ini
2025-12-04T10:11:57.4868608Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4868747Z collecting ... collected 58 items / 7 deselected / 51 selected
2025-12-04T10:11:57.4868839Z stepcurrent: skipping 7 already run items.
2025-12-04T10:11:57.4868907Z Running 51 items in this shard
2025-12-04T10:11:57.4868910Z 
2025-12-04T10:11:57.4869413Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8969s] [  1%]
2025-12-04T10:11:57.4869985Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4780s] [  1%]
2025-12-04T10:11:57.4870436Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4686s] [  1%]
2025-12-04T10:11:57.4870504Z 
2025-12-04T10:11:57.4870592Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4870885Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4870962Z Traceback (most recent call last):
2025-12-04T10:11:57.4871266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4871332Z     method(*args, **kwargs)
2025-12-04T10:11:57.4871633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4871697Z     method(*args, **kwargs)
2025-12-04T10:11:57.4871988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4872054Z     with policy():
2025-12-04T10:11:57.4872349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4872419Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4873226Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4873230Z 
2025-12-04T10:11:57.4873360Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4873887Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4873892Z 
2025-12-04T10:11:57.4874048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4874176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4874272Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4874621Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4874749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4874808Z graph_break []
2025-12-04T10:11:57.4875108Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4875180Z Traceback (most recent call last):
2025-12-04T10:11:57.4875476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4875553Z     method(*args, **kwargs)
2025-12-04T10:11:57.4875845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4875910Z     method(*args, **kwargs)
2025-12-04T10:11:57.4876199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4876257Z     with policy():
2025-12-04T10:11:57.4876554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4876623Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4877520Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4877587Z 
2025-12-04T10:11:57.4877714Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4878232Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4878239Z 
2025-12-04T10:11:57.4878395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4878519Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4878614Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4878964Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4879087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4879151Z graph_break []
2025-12-04T10:11:57.4879277Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4879372Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4879490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4879843Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4879950Z graph_break []
2025-12-04T10:11:57.4880036Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4880333Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4880409Z Traceback (most recent call last):
2025-12-04T10:11:57.4880708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4880778Z     method(*args, **kwargs)
2025-12-04T10:11:57.4881067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4881133Z     method(*args, **kwargs)
2025-12-04T10:11:57.4881427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4881485Z     with policy():
2025-12-04T10:11:57.4881782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4881852Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4882670Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4882677Z 
2025-12-04T10:11:57.4882802Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4883319Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4883324Z 
2025-12-04T10:11:57.4883481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4883603Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4883694Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4884116Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4884241Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4884366Z graph_break []
2025-12-04T10:11:57.4884488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4884575Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4884700Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4885038Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4885097Z graph_break []
2025-12-04T10:11:57.4885221Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4885307Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4885435Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4885772Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4885831Z graph_break []
2025-12-04T10:11:57.4886322Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml -
2025-12-04T10:11:57.4886421Z =========================== short test summary info ============================
2025-12-04T10:11:57.4887725Z FAILED [0.4686s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4887729Z 
2025-12-04T10:11:57.4887855Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4888381Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4888385Z 
2025-12-04T10:11:57.4888542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4888645Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4888773Z =================== 1 failed, 7 deselected, 2 rerun in 2.87s ===================
2025-12-04T10:11:57.4888833Z Got exit code 1
2025-12-04T10:11:57.4888897Z Retrying single test...
2025-12-04T10:11:57.4889166Z W1204 09:32:23.368000 38292 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4889554Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml
2025-12-04T10:11:57.4889656Z ============================= test session starts ==============================
2025-12-04T10:11:57.4889864Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4889930Z cachedir: .pytest_cache
2025-12-04T10:11:57.4890238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4890313Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4890380Z configfile: pytest.ini
2025-12-04T10:11:57.4890766Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4890894Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4891472Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4891635Z Running 1 items in this shard
2025-12-04T10:11:57.4891639Z 
2025-12-04T10:11:57.4892378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:24.662379778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4892382Z 
2025-12-04T10:11:57.4892681Z [W1204 09:32:33.900461285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4892685Z 
2025-12-04T10:11:57.4892985Z [W1204 09:32:33.900713640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4892989Z 
2025-12-04T10:11:57.4893277Z [W1204 09:32:33.906515510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4893283Z 
2025-12-04T10:11:57.4893571Z [W1204 09:32:33.907124810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4893574Z 
2025-12-04T10:11:57.4893872Z [W1204 09:32:33.907296653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4893875Z 
2025-12-04T10:11:57.4894160Z [W1204 09:32:33.912789468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4894163Z 
2025-12-04T10:11:57.4894461Z [W1204 09:32:33.913329287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4894465Z 
2025-12-04T10:11:57.4894751Z [W1204 09:32:33.913493520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4894757Z 
2025-12-04T10:11:57.4894841Z ('RERUN', {'yellow': True}) [11.1534s] [100%]
2025-12-04T10:11:57.4895571Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:34.934545480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4895574Z 
2025-12-04T10:11:57.4895868Z [W1204 09:32:34.935069449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4895872Z 
2025-12-04T10:11:57.4896161Z [W1204 09:32:34.935208421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4896164Z 
2025-12-04T10:11:57.4896453Z [W1204 09:32:34.938117661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4896462Z 
2025-12-04T10:11:57.4896752Z [W1204 09:32:34.938676821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4896755Z 
2025-12-04T10:11:57.4897042Z [W1204 09:32:34.938817933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4897046Z 
2025-12-04T10:11:57.4897338Z [W1204 09:32:35.943361821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4897342Z 
2025-12-04T10:11:57.4897710Z [W1204 09:32:35.943818549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4897714Z 
2025-12-04T10:11:57.4898010Z [W1204 09:32:35.943956642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4898076Z 
2025-12-04T10:11:57.4898156Z ('RERUN', {'yellow': True}) [0.4446s] [100%]
2025-12-04T10:11:57.4898891Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:35.379063070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4898894Z 
2025-12-04T10:11:57.4899184Z [W1204 09:32:35.379586549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4899188Z 
2025-12-04T10:11:57.4899487Z [W1204 09:32:35.379727122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4899491Z 
2025-12-04T10:11:57.4899777Z [W1204 09:32:35.382658992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4899783Z 
2025-12-04T10:11:57.4900072Z [W1204 09:32:35.383211512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4900080Z 
2025-12-04T10:11:57.4900368Z [W1204 09:32:35.383348644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4900371Z 
2025-12-04T10:11:57.4900661Z [W1204 09:32:35.387831571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4900664Z 
2025-12-04T10:11:57.4900973Z [W1204 09:32:35.388282969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4900977Z 
2025-12-04T10:11:57.4901264Z [W1204 09:32:35.388429032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4901270Z 
2025-12-04T10:11:57.4901335Z FAILED [0.4426s] [100%]
2025-12-04T10:11:57.4901338Z 
2025-12-04T10:11:57.4901425Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4901724Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4901802Z Traceback (most recent call last):
2025-12-04T10:11:57.4902110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4902183Z     method(*args, **kwargs)
2025-12-04T10:11:57.4902479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4902543Z     method(*args, **kwargs)
2025-12-04T10:11:57.4902840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4902907Z     with policy():
2025-12-04T10:11:57.4903201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4903271Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4904083Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4904087Z 
2025-12-04T10:11:57.4904225Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4904820Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4904824Z 
2025-12-04T10:11:57.4905049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4905175Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4905270Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4905624Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4905749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4905811Z graph_break []
2025-12-04T10:11:57.4905936Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4906636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4906716Z   if out == self.unknown_value:
2025-12-04T10:11:57.4907009Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4907090Z Traceback (most recent call last):
2025-12-04T10:11:57.4907399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4907464Z     method(*args, **kwargs)
2025-12-04T10:11:57.4907760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4907822Z     method(*args, **kwargs)
2025-12-04T10:11:57.4908114Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4908180Z     with policy():
2025-12-04T10:11:57.4908472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4908543Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4909367Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4909371Z 
2025-12-04T10:11:57.4909495Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4910026Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4910030Z 
2025-12-04T10:11:57.4910185Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4910314Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4910410Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4910760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4910887Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4910943Z graph_break []
2025-12-04T10:11:57.4911072Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4911837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4911907Z   if out == self.unknown_value:
2025-12-04T10:11:57.4912032Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4912189Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4912318Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4912662Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4912722Z graph_break []
2025-12-04T10:11:57.4912808Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4913108Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4913182Z Traceback (most recent call last):
2025-12-04T10:11:57.4913489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4913552Z     method(*args, **kwargs)
2025-12-04T10:11:57.4913845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4913910Z     method(*args, **kwargs)
2025-12-04T10:11:57.4914198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4914264Z     with policy():
2025-12-04T10:11:57.4914555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4914623Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4915450Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4915454Z 
2025-12-04T10:11:57.4915581Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4916107Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4916110Z 
2025-12-04T10:11:57.4916267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4916393Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4916483Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4916828Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4916959Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4917123Z graph_break []
2025-12-04T10:11:57.4917251Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4917942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4918009Z   if out == self.unknown_value:
2025-12-04T10:11:57.4918137Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4918225Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4918353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4918815Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4918879Z graph_break []
2025-12-04T10:11:57.4919019Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4919200Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4919321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4919666Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4919723Z graph_break []
2025-12-04T10:11:57.4920246Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml -
2025-12-04T10:11:57.4920349Z =========================== short test summary info ============================
2025-12-04T10:11:57.4921652Z FAILED [0.4426s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4921667Z 
2025-12-04T10:11:57.4921789Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4922308Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4922312Z 
2025-12-04T10:11:57.4922473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4922575Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4922706Z ================== 1 failed, 57 deselected, 2 rerun in 12.06s ==================
2025-12-04T10:11:57.4922770Z Got exit code 1
2025-12-04T10:11:57.4922840Z Retrying single test...
2025-12-04T10:11:57.4923125Z W1204 09:32:42.025000 38485 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4923511Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml
2025-12-04T10:11:57.4923611Z ============================= test session starts ==============================
2025-12-04T10:11:57.4923845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4923913Z cachedir: .pytest_cache
2025-12-04T10:11:57.4924230Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4924305Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4924370Z configfile: pytest.ini
2025-12-04T10:11:57.4924687Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4924815Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4925389Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4925459Z Running 1 items in this shard
2025-12-04T10:11:57.4925463Z 
2025-12-04T10:11:57.4926268Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:43.322060700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4926277Z 
2025-12-04T10:11:57.4926577Z [W1204 09:32:52.197663170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4926645Z 
2025-12-04T10:11:57.4926937Z [W1204 09:32:52.197911265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4926940Z 
2025-12-04T10:11:57.4927233Z [W1204 09:32:52.203659784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4927236Z 
2025-12-04T10:11:57.4927548Z [W1204 09:32:52.204267884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4927552Z 
2025-12-04T10:11:57.4927900Z [W1204 09:32:52.204449087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4927903Z 
2025-12-04T10:11:57.4928247Z [W1204 09:32:52.209825500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4928254Z 
2025-12-04T10:11:57.4928602Z [W1204 09:32:52.210363229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4928606Z 
2025-12-04T10:11:57.4928954Z [W1204 09:32:52.210529682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4928958Z 
2025-12-04T10:11:57.4929055Z ('RERUN', {'yellow': True}) [10.7953s] [100%]
2025-12-04T10:11:57.4929902Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:53.236600564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4929906Z 
2025-12-04T10:11:57.4930194Z [W1204 09:32:53.237122413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4930204Z 
2025-12-04T10:11:57.4930491Z [W1204 09:32:53.237264965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4930494Z 
2025-12-04T10:11:57.4930781Z [W1204 09:32:53.240151695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4930785Z 
2025-12-04T10:11:57.4931077Z [W1204 09:32:53.240708464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4931081Z 
2025-12-04T10:11:57.4931370Z [W1204 09:32:53.240850847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4931373Z 
2025-12-04T10:11:57.4931665Z [W1204 09:32:53.245336364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4931668Z 
2025-12-04T10:11:57.4931958Z [W1204 09:32:53.245790192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4931961Z 
2025-12-04T10:11:57.4932257Z [W1204 09:32:53.245925174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4932260Z 
2025-12-04T10:11:57.4932338Z ('RERUN', {'yellow': True}) [0.4499s] [100%]
2025-12-04T10:11:57.4933136Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:32:53.684496682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4933146Z 
2025-12-04T10:11:57.4933435Z [W1204 09:32:53.685014021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4933502Z 
2025-12-04T10:11:57.4933792Z [W1204 09:32:53.685153033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4933795Z 
2025-12-04T10:11:57.4934086Z [W1204 09:32:53.687987522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4934090Z 
2025-12-04T10:11:57.4934376Z [W1204 09:32:53.688535731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4934379Z 
2025-12-04T10:11:57.4934680Z [W1204 09:32:53.688679264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4934683Z 
2025-12-04T10:11:57.4934971Z [W1204 09:32:53.693195522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4934974Z 
2025-12-04T10:11:57.4935268Z [W1204 09:32:53.693645199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4935271Z 
2025-12-04T10:11:57.4935561Z [W1204 09:32:53.693779762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4935565Z 
2025-12-04T10:11:57.4935633Z FAILED [0.4469s] [100%]
2025-12-04T10:11:57.4935638Z 
2025-12-04T10:11:57.4935728Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4936026Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4936109Z Traceback (most recent call last):
2025-12-04T10:11:57.4936417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4936482Z     method(*args, **kwargs)
2025-12-04T10:11:57.4936783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4936844Z     method(*args, **kwargs)
2025-12-04T10:11:57.4937140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4937198Z     with policy():
2025-12-04T10:11:57.4937508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4937591Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4938566Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4938571Z 
2025-12-04T10:11:57.4938724Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4939343Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4939346Z 
2025-12-04T10:11:57.4939507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4939635Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4939729Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4940161Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4940290Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4940350Z graph_break []
2025-12-04T10:11:57.4940477Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4941239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4941312Z   if out == self.unknown_value:
2025-12-04T10:11:57.4941607Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4941679Z Traceback (most recent call last):
2025-12-04T10:11:57.4941987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4942050Z     method(*args, **kwargs)
2025-12-04T10:11:57.4942354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4942426Z     method(*args, **kwargs)
2025-12-04T10:11:57.4942720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4942783Z     with policy():
2025-12-04T10:11:57.4943076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4943144Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4943973Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4943977Z 
2025-12-04T10:11:57.4944101Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4944630Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4944636Z 
2025-12-04T10:11:57.4944795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4944921Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4945018Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4945370Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4945497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4945559Z graph_break []
2025-12-04T10:11:57.4945681Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4946376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4946449Z   if out == self.unknown_value:
2025-12-04T10:11:57.4946576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4946666Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4946790Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4947141Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4947292Z graph_break []
2025-12-04T10:11:57.4947382Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4947676Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.4947815Z Traceback (most recent call last):
2025-12-04T10:11:57.4948119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4948181Z     method(*args, **kwargs)
2025-12-04T10:11:57.4948474Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4948542Z     method(*args, **kwargs)
2025-12-04T10:11:57.4948835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4948900Z     with policy():
2025-12-04T10:11:57.4949193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4949258Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4950085Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4950091Z 
2025-12-04T10:11:57.4950215Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4950741Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4950745Z 
2025-12-04T10:11:57.4950900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4951032Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4951125Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4951468Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4951602Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4951660Z graph_break []
2025-12-04T10:11:57.4951783Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4952475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4952541Z   if out == self.unknown_value:
2025-12-04T10:11:57.4952669Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4952770Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4952895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4953247Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4953306Z graph_break []
2025-12-04T10:11:57.4953430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4953517Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4953634Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4953979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4954036Z graph_break []
2025-12-04T10:11:57.4954673Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml -
2025-12-04T10:11:57.4954779Z =========================== short test summary info ============================
2025-12-04T10:11:57.4956139Z FAILED [0.4469s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4956147Z 
2025-12-04T10:11:57.4956270Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4956797Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4956801Z 
2025-12-04T10:11:57.4956960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4957065Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4957178Z ================== 1 failed, 57 deselected, 2 rerun in 11.72s ==================
2025-12-04T10:11:57.4957240Z Got exit code 1
2025-12-04T10:11:57.4957713Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.4957969Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.4958232Z W1204 09:33:00.310000 38678 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4958623Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml
2025-12-04T10:11:57.4958731Z ============================= test session starts ==============================
2025-12-04T10:11:57.4958939Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4959008Z cachedir: .pytest_cache
2025-12-04T10:11:57.4959313Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4959388Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4959458Z configfile: pytest.ini
2025-12-04T10:11:57.4959777Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4959957Z collecting ... collected 58 items / 8 deselected / 50 selected
2025-12-04T10:11:57.4960059Z stepcurrent: skipping 8 already run items.
2025-12-04T10:11:57.4960133Z Running 50 items in this shard
2025-12-04T10:11:57.4960137Z 
2025-12-04T10:11:57.4960641Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9320s] [  2%]
2025-12-04T10:11:57.4961131Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5370s] [  2%]
2025-12-04T10:11:57.4961587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5338s] [  2%]
2025-12-04T10:11:57.4961591Z 
2025-12-04T10:11:57.4961747Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4962042Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4962122Z Traceback (most recent call last):
2025-12-04T10:11:57.4962427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4962561Z     method(*args, **kwargs)
2025-12-04T10:11:57.4962858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4962920Z     method(*args, **kwargs)
2025-12-04T10:11:57.4963215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4963275Z     with policy():
2025-12-04T10:11:57.4963572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4963654Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4964454Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4964461Z 
2025-12-04T10:11:57.4964593Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4965113Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4965116Z 
2025-12-04T10:11:57.4965280Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4965410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4965509Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4966057Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4966189Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4966257Z graph_break []
2025-12-04T10:11:57.4966555Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4966631Z Traceback (most recent call last):
2025-12-04T10:11:57.4966933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4966995Z     method(*args, **kwargs)
2025-12-04T10:11:57.4967288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4967356Z     method(*args, **kwargs)
2025-12-04T10:11:57.4967647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4967712Z     with policy():
2025-12-04T10:11:57.4968014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4968082Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4968895Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.4968900Z 
2025-12-04T10:11:57.4969028Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4969624Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4969628Z 
2025-12-04T10:11:57.4969797Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4970008Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4970106Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4970649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4970780Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4970838Z graph_break []
2025-12-04T10:11:57.4970964Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4971058Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4971178Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4971721Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4971784Z graph_break []
2025-12-04T10:11:57.4971865Z =================================== FAILURES ===================================
2025-12-04T10:11:57.4972157Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4972229Z Traceback (most recent call last):
2025-12-04T10:11:57.4972529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4972601Z     method(*args, **kwargs)
2025-12-04T10:11:57.4972892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4972958Z     method(*args, **kwargs)
2025-12-04T10:11:57.4973252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4973313Z     with policy():
2025-12-04T10:11:57.4973613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4973678Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4974500Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4974505Z 
2025-12-04T10:11:57.4974628Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4975150Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4975161Z 
2025-12-04T10:11:57.4975316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4975443Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4975539Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4976079Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4976271Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4976346Z graph_break []
2025-12-04T10:11:57.4976469Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4976562Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4976748Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4977283Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4977346Z graph_break []
2025-12-04T10:11:57.4977469Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4977562Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4977683Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4978220Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4978286Z graph_break []
2025-12-04T10:11:57.4978782Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml -
2025-12-04T10:11:57.4978891Z =========================== short test summary info ============================
2025-12-04T10:11:57.4980178Z FAILED [0.5338s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.4980183Z 
2025-12-04T10:11:57.4980309Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4980824Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4980831Z 
2025-12-04T10:11:57.4980986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4981096Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.4981210Z =================== 1 failed, 8 deselected, 2 rerun in 3.03s ===================
2025-12-04T10:11:57.4981267Z Got exit code 1
2025-12-04T10:11:57.4981335Z Retrying single test...
2025-12-04T10:11:57.4981597Z W1204 09:33:09.985000 38860 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.4981985Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml
2025-12-04T10:11:57.4982080Z ============================= test session starts ==============================
2025-12-04T10:11:57.4982288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.4982357Z cachedir: .pytest_cache
2025-12-04T10:11:57.4982662Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.4982742Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.4982807Z configfile: pytest.ini
2025-12-04T10:11:57.4983119Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.4983321Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.4983892Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4984027Z Running 1 items in this shard
2025-12-04T10:11:57.4984035Z 
2025-12-04T10:11:57.4984763Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:11.576369425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4984767Z 
2025-12-04T10:11:57.4985064Z [W1204 09:33:20.581721768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4985072Z 
2025-12-04T10:11:57.4985365Z [W1204 09:33:20.581983773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4985369Z 
2025-12-04T10:11:57.4985657Z [W1204 09:33:20.588022667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4985664Z 
2025-12-04T10:11:57.4985956Z [W1204 09:33:20.588639418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4985960Z 
2025-12-04T10:11:57.4986246Z [W1204 09:33:20.588820061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4986249Z 
2025-12-04T10:11:57.4986547Z [W1204 09:33:20.594316085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4986551Z 
2025-12-04T10:11:57.4986842Z [W1204 09:33:20.594870644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4986846Z 
2025-12-04T10:11:57.4987138Z [W1204 09:33:20.595034167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4987143Z 
2025-12-04T10:11:57.4987234Z ('RERUN', {'yellow': True}) [10.9543s] [100%]
2025-12-04T10:11:57.4987958Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:21.398347004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4987966Z 
2025-12-04T10:11:57.4988262Z [W1204 09:33:21.398897023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4988266Z 
2025-12-04T10:11:57.4988558Z [W1204 09:33:21.399037876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4988561Z 
2025-12-04T10:11:57.4988854Z [W1204 09:33:21.402048727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4988860Z 
2025-12-04T10:11:57.4989148Z [W1204 09:33:21.402506385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4989151Z 
2025-12-04T10:11:57.4989442Z [W1204 09:33:21.402646787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4989445Z 
2025-12-04T10:11:57.4989733Z [W1204 09:33:21.407247326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4989736Z 
2025-12-04T10:11:57.4990102Z [W1204 09:33:21.407713024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4990105Z 
2025-12-04T10:11:57.4990396Z [W1204 09:33:21.407857206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4990399Z 
2025-12-04T10:11:57.4990544Z ('RERUN', {'yellow': True}) [0.5005s] [100%]
2025-12-04T10:11:57.4991269Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:21.897403745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4991273Z 
2025-12-04T10:11:57.4991757Z [W1204 09:33:21.897958635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4991766Z 
2025-12-04T10:11:57.4992081Z [W1204 09:33:21.898098617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4992084Z 
2025-12-04T10:11:57.4992377Z [W1204 09:33:21.901065158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4992381Z 
2025-12-04T10:11:57.4992682Z [W1204 09:33:21.901525035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4992685Z 
2025-12-04T10:11:57.4992976Z [W1204 09:33:21.901663358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4992979Z 
2025-12-04T10:11:57.4993272Z [W1204 09:33:21.906195075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4993276Z 
2025-12-04T10:11:57.4993565Z [W1204 09:33:21.906654353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4993570Z 
2025-12-04T10:11:57.4993862Z [W1204 09:33:21.906792356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.4993865Z 
2025-12-04T10:11:57.4993937Z FAILED [0.4959s] [100%]
2025-12-04T10:11:57.4993944Z 
2025-12-04T10:11:57.4994036Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.4994341Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.4994417Z Traceback (most recent call last):
2025-12-04T10:11:57.4994729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4994795Z     method(*args, **kwargs)
2025-12-04T10:11:57.4995089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.4995160Z     method(*args, **kwargs)
2025-12-04T10:11:57.4995451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.4995514Z     with policy():
2025-12-04T10:11:57.4995814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.4995879Z     raise RuntimeError(msg)
2025-12-04T10:11:57.4996684Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.4996688Z 
2025-12-04T10:11:57.4996823Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.4997438Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.4997445Z 
2025-12-04T10:11:57.4997611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.4997832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.4997944Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.4998496Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.4998629Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.4998688Z graph_break []
2025-12-04T10:11:57.4998813Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.4999516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.4999594Z   if out == self.unknown_value:
2025-12-04T10:11:57.4999974Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5000057Z Traceback (most recent call last):
2025-12-04T10:11:57.5000403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5000476Z     method(*args, **kwargs)
2025-12-04T10:11:57.5000774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5000837Z     method(*args, **kwargs)
2025-12-04T10:11:57.5001132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5001191Z     with policy():
2025-12-04T10:11:57.5001486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5001555Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5002362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5002370Z 
2025-12-04T10:11:57.5002498Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5003017Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5003020Z 
2025-12-04T10:11:57.5003181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5003317Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5003416Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5003964Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5004093Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5004155Z graph_break []
2025-12-04T10:11:57.5004279Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5005046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5005123Z   if out == self.unknown_value:
2025-12-04T10:11:57.5005243Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5005402Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5005524Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5006065Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5006131Z graph_break []
2025-12-04T10:11:57.5006214Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5006508Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5006583Z Traceback (most recent call last):
2025-12-04T10:11:57.5006892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5006968Z     method(*args, **kwargs)
2025-12-04T10:11:57.5007259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5007323Z     method(*args, **kwargs)
2025-12-04T10:11:57.5007613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5007673Z     with policy():
2025-12-04T10:11:57.5007977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5008041Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5008860Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5008871Z 
2025-12-04T10:11:57.5008998Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5009510Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5009514Z 
2025-12-04T10:11:57.5009673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5009798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5009896Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5010439Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5010566Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5010629Z graph_break []
2025-12-04T10:11:57.5010749Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5011436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5011507Z   if out == self.unknown_value:
2025-12-04T10:11:57.5011628Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5011801Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5011926Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5012465Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5012592Z graph_break []
2025-12-04T10:11:57.5012720Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5012815Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5012936Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5013475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5013542Z graph_break []
2025-12-04T10:11:57.5014031Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml -
2025-12-04T10:11:57.5014134Z =========================== short test summary info ============================
2025-12-04T10:11:57.5015421Z FAILED [0.4959s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5015425Z 
2025-12-04T10:11:57.5015555Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5016075Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5016079Z 
2025-12-04T10:11:57.5016234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5016342Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5016457Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ==================
2025-12-04T10:11:57.5016585Z Got exit code 1
2025-12-04T10:11:57.5025220Z Retrying single test...
2025-12-04T10:11:57.5025605Z W1204 09:33:28.518000 39047 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5026019Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml
2025-12-04T10:11:57.5026140Z ============================= test session starts ==============================
2025-12-04T10:11:57.5026373Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5026446Z cachedir: .pytest_cache
2025-12-04T10:11:57.5026771Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5026858Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5026926Z configfile: pytest.ini
2025-12-04T10:11:57.5027259Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5027393Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5028159Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5028254Z Running 1 items in this shard
2025-12-04T10:11:57.5028259Z 
2025-12-04T10:11:57.5029009Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:30.095599455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5029106Z 
2025-12-04T10:11:57.5029422Z [W1204 09:33:39.157059392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5029426Z 
2025-12-04T10:11:57.5029758Z [W1204 09:33:39.157319097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5029762Z 
2025-12-04T10:11:57.5030058Z [W1204 09:33:39.163698926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5030062Z 
2025-12-04T10:11:57.5030351Z [W1204 09:33:39.164292726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5030355Z 
2025-12-04T10:11:57.5030647Z [W1204 09:33:39.164489579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5030653Z 
2025-12-04T10:11:57.5030942Z [W1204 09:33:39.169829230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5030945Z 
2025-12-04T10:11:57.5031237Z [W1204 09:33:39.170698485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5031241Z 
2025-12-04T10:11:57.5031530Z [W1204 09:33:39.170865868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5031536Z 
2025-12-04T10:11:57.5031620Z ('RERUN', {'yellow': True}) [10.9960s] [100%]
2025-12-04T10:11:57.5032378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:40.966166290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5032387Z 
2025-12-04T10:11:57.5032678Z [W1204 09:33:40.966687179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5032682Z 
2025-12-04T10:11:57.5032972Z [W1204 09:33:40.966824491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5032976Z 
2025-12-04T10:11:57.5033262Z [W1204 09:33:40.969659470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5033268Z 
2025-12-04T10:11:57.5033562Z [W1204 09:33:40.970119147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5033565Z 
2025-12-04T10:11:57.5033855Z [W1204 09:33:40.970260680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5033861Z 
2025-12-04T10:11:57.5034151Z [W1204 09:33:40.974637874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5034155Z 
2025-12-04T10:11:57.5034442Z [W1204 09:33:40.975086502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5034445Z 
2025-12-04T10:11:57.5034735Z [W1204 09:33:40.975221514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5034738Z 
2025-12-04T10:11:57.5034892Z ('RERUN', {'yellow': True}) [0.4908s] [100%]
2025-12-04T10:11:57.5035610Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:33:40.455635559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5035683Z 
2025-12-04T10:11:57.5035972Z [W1204 09:33:40.456165199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5035975Z 
2025-12-04T10:11:57.5036262Z [W1204 09:33:40.456312511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5036265Z 
2025-12-04T10:11:57.5036556Z [W1204 09:33:40.459096058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5036559Z 
2025-12-04T10:11:57.5036847Z [W1204 09:33:40.459530056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5036850Z 
2025-12-04T10:11:57.5037141Z [W1204 09:33:40.459666068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5037148Z 
2025-12-04T10:11:57.5037434Z [W1204 09:33:40.464033353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5037438Z 
2025-12-04T10:11:57.5037727Z [W1204 09:33:40.464492771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5037731Z 
2025-12-04T10:11:57.5038017Z [W1204 09:33:40.464629483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5038021Z 
2025-12-04T10:11:57.5038087Z FAILED [0.4879s] [100%]
2025-12-04T10:11:57.5038095Z 
2025-12-04T10:11:57.5038186Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5038483Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5038578Z Traceback (most recent call last):
2025-12-04T10:11:57.5038906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5038976Z     method(*args, **kwargs)
2025-12-04T10:11:57.5039280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5039345Z     method(*args, **kwargs)
2025-12-04T10:11:57.5039641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5039705Z     with policy():
2025-12-04T10:11:57.5040096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5040172Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5041003Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5041010Z 
2025-12-04T10:11:57.5041149Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5041669Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5041673Z 
2025-12-04T10:11:57.5041926Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5042075Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5042173Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5042730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5043367Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5043467Z graph_break []
2025-12-04T10:11:57.5043710Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5047811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5047901Z   if out == self.unknown_value:
2025-12-04T10:11:57.5048210Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5048290Z Traceback (most recent call last):
2025-12-04T10:11:57.5048665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5048754Z     method(*args, **kwargs)
2025-12-04T10:11:57.5049121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5049200Z     method(*args, **kwargs)
2025-12-04T10:11:57.5049571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5049646Z     with policy():
2025-12-04T10:11:57.5050020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5050100Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5051115Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5051128Z 
2025-12-04T10:11:57.5051295Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5051947Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5051952Z 
2025-12-04T10:11:57.5052155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5052327Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5052450Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5053131Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5053295Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5053371Z graph_break []
2025-12-04T10:11:57.5053524Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5054386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5054472Z   if out == self.unknown_value:
2025-12-04T10:11:57.5054729Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5054851Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5055003Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5055783Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5055856Z graph_break []
2025-12-04T10:11:57.5055957Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5056325Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5056416Z Traceback (most recent call last):
2025-12-04T10:11:57.5056790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5077593Z     method(*args, **kwargs)
2025-12-04T10:11:57.5078080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5078237Z     method(*args, **kwargs)
2025-12-04T10:11:57.5079087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5079254Z     with policy():
2025-12-04T10:11:57.5080280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5080441Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5082418Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5082433Z 
2025-12-04T10:11:57.5082746Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5083994Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5084012Z 
2025-12-04T10:11:57.5084402Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5084715Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5084948Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5086248Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5086557Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5086702Z graph_break []
2025-12-04T10:11:57.5087000Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5088889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5089121Z   if out == self.unknown_value:
2025-12-04T10:11:57.5089471Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5089735Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5090084Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5091862Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5092038Z graph_break []
2025-12-04T10:11:57.5092379Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5092802Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5093012Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5093617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5093675Z graph_break []
2025-12-04T10:11:57.5094160Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml -
2025-12-04T10:11:57.5094264Z =========================== short test summary info ============================
2025-12-04T10:11:57.5095557Z FAILED [0.4879s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5095565Z 
2025-12-04T10:11:57.5095692Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5096208Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5096212Z 
2025-12-04T10:11:57.5096373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5096490Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5096613Z ================== 1 failed, 57 deselected, 2 rerun in 12.00s ==================
2025-12-04T10:11:57.5096676Z Got exit code 1
2025-12-04T10:11:57.5097155Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5097402Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5097669Z W1204 09:33:47.097000 39233 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5098064Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml
2025-12-04T10:11:57.5098171Z ============================= test session starts ==============================
2025-12-04T10:11:57.5098389Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5098458Z cachedir: .pytest_cache
2025-12-04T10:11:57.5098767Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5098852Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5098916Z configfile: pytest.ini
2025-12-04T10:11:57.5099232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5099364Z collecting ... collected 58 items / 9 deselected / 49 selected
2025-12-04T10:11:57.5099451Z stepcurrent: skipping 9 already run items.
2025-12-04T10:11:57.5099527Z Running 49 items in this shard
2025-12-04T10:11:57.5099530Z 
2025-12-04T10:11:57.5100101Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9189s] [  2%]
2025-12-04T10:11:57.5100595Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5388s] [  2%]
2025-12-04T10:11:57.5101104Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5365s] [  2%]
2025-12-04T10:11:57.5101108Z 
2025-12-04T10:11:57.5101189Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5101487Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5101563Z Traceback (most recent call last):
2025-12-04T10:11:57.5101877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5101942Z     method(*args, **kwargs)
2025-12-04T10:11:57.5102235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5102307Z     method(*args, **kwargs)
2025-12-04T10:11:57.5102596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5102655Z     with policy():
2025-12-04T10:11:57.5102955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5103020Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5103824Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5103829Z 
2025-12-04T10:11:57.5103957Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5104487Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5104491Z 
2025-12-04T10:11:57.5104652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5104780Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5104883Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5105430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5105565Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5105625Z graph_break []
2025-12-04T10:11:57.5105917Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5105996Z Traceback (most recent call last):
2025-12-04T10:11:57.5106295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5106358Z     method(*args, **kwargs)
2025-12-04T10:11:57.5106655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5106717Z     method(*args, **kwargs)
2025-12-04T10:11:57.5107004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5107224Z     with policy():
2025-12-04T10:11:57.5107519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5107588Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5108467Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5108471Z 
2025-12-04T10:11:57.5108598Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5109119Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5109122Z 
2025-12-04T10:11:57.5109278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5109410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5109502Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5110049Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5110181Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5110250Z graph_break []
2025-12-04T10:11:57.5110380Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5110468Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5110592Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5111144Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5111200Z graph_break []
2025-12-04T10:11:57.5111291Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5111584Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5111658Z Traceback (most recent call last):
2025-12-04T10:11:57.5111961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5112025Z     method(*args, **kwargs)
2025-12-04T10:11:57.5112316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5112382Z     method(*args, **kwargs)
2025-12-04T10:11:57.5112670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5112734Z     with policy():
2025-12-04T10:11:57.5113025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5113093Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5113923Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5113927Z 
2025-12-04T10:11:57.5114056Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5114675Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5114679Z 
2025-12-04T10:11:57.5114839Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5115037Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5115128Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5115678Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5115805Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5115867Z graph_break []
2025-12-04T10:11:57.5115992Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5116086Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5116216Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5116761Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5116823Z graph_break []
2025-12-04T10:11:57.5116943Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5117206Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5117339Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5117880Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5117941Z graph_break []
2025-12-04T10:11:57.5118442Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml -
2025-12-04T10:11:57.5118543Z =========================== short test summary info ============================
2025-12-04T10:11:57.5119832Z FAILED [0.5365s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5119836Z 
2025-12-04T10:11:57.5120002Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5120526Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5120530Z 
2025-12-04T10:11:57.5120693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5120796Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5120909Z =================== 1 failed, 9 deselected, 2 rerun in 3.02s ===================
2025-12-04T10:11:57.5120973Z Got exit code 1
2025-12-04T10:11:57.5121037Z Retrying single test...
2025-12-04T10:11:57.5121300Z W1204 09:33:56.766000 39415 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5121700Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml
2025-12-04T10:11:57.5121914Z ============================= test session starts ==============================
2025-12-04T10:11:57.5122130Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5122196Z cachedir: .pytest_cache
2025-12-04T10:11:57.5122592Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5122673Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5122739Z configfile: pytest.ini
2025-12-04T10:11:57.5123060Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5123188Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5123760Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5123836Z Running 1 items in this shard
2025-12-04T10:11:57.5123839Z 
2025-12-04T10:11:57.5124568Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:33:58.364642927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5124575Z 
2025-12-04T10:11:57.5124879Z [W1204 09:34:07.508603962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5124883Z 
2025-12-04T10:11:57.5125173Z [W1204 09:34:07.508866637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5125176Z 
2025-12-04T10:11:57.5125473Z [W1204 09:34:07.514919610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5125477Z 
2025-12-04T10:11:57.5125765Z [W1204 09:34:07.515533741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5125768Z 
2025-12-04T10:11:57.5126059Z [W1204 09:34:07.515715014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5126065Z 
2025-12-04T10:11:57.5126352Z [W1204 09:34:07.521246868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5126355Z 
2025-12-04T10:11:57.5126643Z [W1204 09:34:07.521770947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5126646Z 
2025-12-04T10:11:57.5126939Z [W1204 09:34:07.521934730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5126945Z 
2025-12-04T10:11:57.5127031Z ('RERUN', {'yellow': True}) [11.0976s] [100%]
2025-12-04T10:11:57.5127756Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:08.326060414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5127762Z 
2025-12-04T10:11:57.5128052Z [W1204 09:34:08.326621193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5128056Z 
2025-12-04T10:11:57.5128347Z [W1204 09:34:08.326763596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5128350Z 
2025-12-04T10:11:57.5128637Z [W1204 09:34:08.329663705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5128711Z 
2025-12-04T10:11:57.5129024Z [W1204 09:34:08.330134104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5129028Z 
2025-12-04T10:11:57.5129318Z [W1204 09:34:08.330274326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5129385Z 
2025-12-04T10:11:57.5129674Z [W1204 09:34:08.334745593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5129682Z 
2025-12-04T10:11:57.5129970Z [W1204 09:34:08.335193161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5129973Z 
2025-12-04T10:11:57.5130262Z [W1204 09:34:08.335329273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5130265Z 
2025-12-04T10:11:57.5130352Z ('RERUN', {'yellow': True}) [0.5006s] [100%]
2025-12-04T10:11:57.5131070Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:08.824520567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5131077Z 
2025-12-04T10:11:57.5131374Z [W1204 09:34:08.825056926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5131377Z 
2025-12-04T10:11:57.5131665Z [W1204 09:34:08.825200848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5131668Z 
2025-12-04T10:11:57.5131958Z [W1204 09:34:08.828193859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5131961Z 
2025-12-04T10:11:57.5132252Z [W1204 09:34:08.828654707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5132255Z 
2025-12-04T10:11:57.5132547Z [W1204 09:34:08.828793080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5132553Z 
2025-12-04T10:11:57.5132841Z [W1204 09:34:08.833453559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5132845Z 
2025-12-04T10:11:57.5133134Z [W1204 09:34:08.833903847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5133137Z 
2025-12-04T10:11:57.5133433Z [W1204 09:34:08.834038189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5133436Z 
2025-12-04T10:11:57.5133500Z FAILED [0.4986s] [100%]
2025-12-04T10:11:57.5133503Z 
2025-12-04T10:11:57.5133588Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5133882Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5133960Z Traceback (most recent call last):
2025-12-04T10:11:57.5134274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5134338Z     method(*args, **kwargs)
2025-12-04T10:11:57.5134642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5134706Z     method(*args, **kwargs)
2025-12-04T10:11:57.5134998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5135072Z     with policy():
2025-12-04T10:11:57.5135437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5135504Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5136305Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5136374Z 
2025-12-04T10:11:57.5136501Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5137022Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5137026Z 
2025-12-04T10:11:57.5137186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5137318Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5137410Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5137957Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5138094Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5138152Z graph_break []
2025-12-04T10:11:57.5138279Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5138965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5139038Z   if out == self.unknown_value:
2025-12-04T10:11:57.5139332Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5139404Z Traceback (most recent call last):
2025-12-04T10:11:57.5139712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5139774Z     method(*args, **kwargs)
2025-12-04T10:11:57.5140062Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5140132Z     method(*args, **kwargs)
2025-12-04T10:11:57.5140424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5140483Z     with policy():
2025-12-04T10:11:57.5140784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5140859Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5141671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5141678Z 
2025-12-04T10:11:57.5141804Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5142322Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5142326Z 
2025-12-04T10:11:57.5142484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5142706Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5142810Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5143353Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5143568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5143628Z graph_break []
2025-12-04T10:11:57.5143754Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5144456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5144527Z   if out == self.unknown_value:
2025-12-04T10:11:57.5144653Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5144748Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5144871Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5145419Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5145478Z graph_break []
2025-12-04T10:11:57.5145559Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5145850Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5145922Z Traceback (most recent call last):
2025-12-04T10:11:57.5146226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5146294Z     method(*args, **kwargs)
2025-12-04T10:11:57.5146584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5146649Z     method(*args, **kwargs)
2025-12-04T10:11:57.5146941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5147001Z     with policy():
2025-12-04T10:11:57.5147298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5147361Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5148180Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5148185Z 
2025-12-04T10:11:57.5148307Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5148831Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5148837Z 
2025-12-04T10:11:57.5148992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5149114Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5149206Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5149744Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5149955Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5150016Z graph_break []
2025-12-04T10:11:57.5150138Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5150827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5150966Z   if out == self.unknown_value:
2025-12-04T10:11:57.5151090Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5151179Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5151299Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5151842Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5151901Z graph_break []
2025-12-04T10:11:57.5152022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5152116Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5152235Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5152773Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5152833Z graph_break []
2025-12-04T10:11:57.5153318Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml -
2025-12-04T10:11:57.5153424Z =========================== short test summary info ============================
2025-12-04T10:11:57.5154723Z FAILED [0.4986s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5154729Z 
2025-12-04T10:11:57.5154857Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5155371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5155374Z 
2025-12-04T10:11:57.5155534Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5155636Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5155749Z ================== 1 failed, 57 deselected, 2 rerun in 12.12s ==================
2025-12-04T10:11:57.5155824Z Got exit code 1
2025-12-04T10:11:57.5155891Z Retrying single test...
2025-12-04T10:11:57.5156160Z W1204 09:34:15.555000 39602 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5156546Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml
2025-12-04T10:11:57.5156639Z ============================= test session starts ==============================
2025-12-04T10:11:57.5156849Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5156915Z cachedir: .pytest_cache
2025-12-04T10:11:57.5157294Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5157375Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5157440Z configfile: pytest.ini
2025-12-04T10:11:57.5157824Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5157949Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5158518Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5158595Z Running 1 items in this shard
2025-12-04T10:11:57.5158599Z 
2025-12-04T10:11:57.5159343Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:17.144401272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5159346Z 
2025-12-04T10:11:57.5159649Z [W1204 09:34:26.278662521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5159656Z 
2025-12-04T10:11:57.5160004Z [W1204 09:34:26.278929776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5160008Z 
2025-12-04T10:11:57.5160307Z [W1204 09:34:26.285086872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5160310Z 
2025-12-04T10:11:57.5160597Z [W1204 09:34:26.285711462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5160600Z 
2025-12-04T10:11:57.5160894Z [W1204 09:34:26.285899886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5160898Z 
2025-12-04T10:11:57.5161186Z [W1204 09:34:26.291361730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5161192Z 
2025-12-04T10:11:57.5161479Z [W1204 09:34:26.291891459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5161485Z 
2025-12-04T10:11:57.5161785Z [W1204 09:34:26.292048902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5161788Z 
2025-12-04T10:11:57.5161868Z ('RERUN', {'yellow': True}) [11.0728s] [100%]
2025-12-04T10:11:57.5162598Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:27.084410851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5162602Z 
2025-12-04T10:11:57.5162892Z [W1204 09:34:27.084933660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5162898Z 
2025-12-04T10:11:57.5163190Z [W1204 09:34:27.085072762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5163193Z 
2025-12-04T10:11:57.5163478Z [W1204 09:34:27.087915501 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5163480Z 
2025-12-04T10:11:57.5163770Z [W1204 09:34:27.088369468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5163773Z 
2025-12-04T10:11:57.5164141Z [W1204 09:34:27.088506801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5164148Z 
2025-12-04T10:11:57.5164440Z [W1204 09:34:27.093030858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5164527Z 
2025-12-04T10:11:57.5164820Z [W1204 09:34:27.093485156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5164825Z 
2025-12-04T10:11:57.5165119Z [W1204 09:34:27.093621158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5165126Z 
2025-12-04T10:11:57.5165202Z ('RERUN', {'yellow': True}) [0.4927s] [100%]
2025-12-04T10:11:57.5165920Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:34:27.574981942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5165924Z 
2025-12-04T10:11:57.5166217Z [W1204 09:34:27.575529791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5166224Z 
2025-12-04T10:11:57.5166509Z [W1204 09:34:27.575669134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5166513Z 
2025-12-04T10:11:57.5166804Z [W1204 09:34:27.578544153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5166807Z 
2025-12-04T10:11:57.5167095Z [W1204 09:34:27.578988180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5167097Z 
2025-12-04T10:11:57.5167393Z [W1204 09:34:27.579125523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5167396Z 
2025-12-04T10:11:57.5167681Z [W1204 09:34:27.583512698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5167687Z 
2025-12-04T10:11:57.5167973Z [W1204 09:34:27.583967825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5167980Z 
2025-12-04T10:11:57.5168267Z [W1204 09:34:27.584105988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5168270Z 
2025-12-04T10:11:57.5168331Z FAILED [0.4872s] [100%]
2025-12-04T10:11:57.5168334Z 
2025-12-04T10:11:57.5168422Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5168714Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5168790Z Traceback (most recent call last):
2025-12-04T10:11:57.5169095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5169163Z     method(*args, **kwargs)
2025-12-04T10:11:57.5169460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5169522Z     method(*args, **kwargs)
2025-12-04T10:11:57.5169809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5169883Z     with policy():
2025-12-04T10:11:57.5170179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5170248Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5171112Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5171181Z 
2025-12-04T10:11:57.5171319Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5171837Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5171841Z 
2025-12-04T10:11:57.5172005Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5172129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5172221Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5172771Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5172898Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5172962Z graph_break []
2025-12-04T10:11:57.5173085Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5173781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5173856Z   if out == self.unknown_value:
2025-12-04T10:11:57.5174143Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5174221Z Traceback (most recent call last):
2025-12-04T10:11:57.5174518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5174592Z     method(*args, **kwargs)
2025-12-04T10:11:57.5174888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5174953Z     method(*args, **kwargs)
2025-12-04T10:11:57.5175240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5175306Z     with policy():
2025-12-04T10:11:57.5175600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5175668Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5176473Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5176477Z 
2025-12-04T10:11:57.5176600Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5177126Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5177130Z 
2025-12-04T10:11:57.5177286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5177413Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5177504Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5178118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5178247Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5178305Z graph_break []
2025-12-04T10:11:57.5178499Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5179185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5179255Z   if out == self.unknown_value:
2025-12-04T10:11:57.5179379Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5179468Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5179592Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5180131Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5180191Z graph_break []
2025-12-04T10:11:57.5180277Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5180564Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5180639Z Traceback (most recent call last):
2025-12-04T10:11:57.5180935Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5180997Z     method(*args, **kwargs)
2025-12-04T10:11:57.5181290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5181353Z     method(*args, **kwargs)
2025-12-04T10:11:57.5181643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5181707Z     with policy():
2025-12-04T10:11:57.5181998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5182071Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5182885Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5182890Z 
2025-12-04T10:11:57.5183013Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5183536Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5183540Z 
2025-12-04T10:11:57.5183692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5183819Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5183908Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5184465Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5184594Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5184653Z graph_break []
2025-12-04T10:11:57.5184778Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5185537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5185672Z   if out == self.unknown_value:
2025-12-04T10:11:57.5185798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5185885Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5186010Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5186550Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5186607Z graph_break []
2025-12-04T10:11:57.5186731Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5186820Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5186943Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5187477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5187536Z graph_break []
2025-12-04T10:11:57.5188027Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml -
2025-12-04T10:11:57.5188126Z =========================== short test summary info ============================
2025-12-04T10:11:57.5189416Z FAILED [0.4872s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5189422Z 
2025-12-04T10:11:57.5189547Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5190065Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5190068Z 
2025-12-04T10:11:57.5190223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5190324Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5190442Z ================== 1 failed, 57 deselected, 2 rerun in 12.08s ==================
2025-12-04T10:11:57.5190502Z Got exit code 1
2025-12-04T10:11:57.5190981Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5191227Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5191488Z W1204 09:34:34.190000 39788 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5191874Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml
2025-12-04T10:11:57.5191967Z ============================= test session starts ==============================
2025-12-04T10:11:57.5192177Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5192315Z cachedir: .pytest_cache
2025-12-04T10:11:57.5192622Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5192702Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5192768Z configfile: pytest.ini
2025-12-04T10:11:57.5193179Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5193308Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T10:11:57.5193397Z stepcurrent: skipping 10 already run items.
2025-12-04T10:11:57.5193471Z Running 48 items in this shard
2025-12-04T10:11:57.5193475Z 
2025-12-04T10:11:57.5193979Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9162s] [  2%]
2025-12-04T10:11:57.5194475Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4936s] [  2%]
2025-12-04T10:11:57.5194932Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4892s] [  2%]
2025-12-04T10:11:57.5194938Z 
2025-12-04T10:11:57.5195019Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5195319Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5195394Z Traceback (most recent call last):
2025-12-04T10:11:57.5195697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5195768Z     method(*args, **kwargs)
2025-12-04T10:11:57.5196061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5196128Z     method(*args, **kwargs)
2025-12-04T10:11:57.5196419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5196481Z     with policy():
2025-12-04T10:11:57.5196781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5196846Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5197656Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5197660Z 
2025-12-04T10:11:57.5197788Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5198306Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5198318Z 
2025-12-04T10:11:57.5198478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5198602Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5198699Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5199048Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5199176Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5199238Z graph_break []
2025-12-04T10:11:57.5199604Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5199683Z Traceback (most recent call last):
2025-12-04T10:11:57.5200038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5200169Z     method(*args, **kwargs)
2025-12-04T10:11:57.5200464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5200525Z     method(*args, **kwargs)
2025-12-04T10:11:57.5200812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5200876Z     with policy():
2025-12-04T10:11:57.5201170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5201238Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5202061Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5202068Z 
2025-12-04T10:11:57.5202190Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5202712Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5202716Z 
2025-12-04T10:11:57.5202885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5203019Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5203110Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5203463Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5203587Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5203648Z graph_break []
2025-12-04T10:11:57.5203774Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5203861Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5203980Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5204328Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5204386Z graph_break []
2025-12-04T10:11:57.5204472Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5204778Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5204852Z Traceback (most recent call last):
2025-12-04T10:11:57.5205155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5205219Z     method(*args, **kwargs)
2025-12-04T10:11:57.5205511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5205578Z     method(*args, **kwargs)
2025-12-04T10:11:57.5205866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5205929Z     with policy():
2025-12-04T10:11:57.5206223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5206286Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5207184Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5207251Z 
2025-12-04T10:11:57.5207376Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5207905Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5207908Z 
2025-12-04T10:11:57.5208069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5208193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5208302Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5208652Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5208781Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5208848Z graph_break []
2025-12-04T10:11:57.5208969Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5209062Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5209182Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5209526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5209583Z graph_break []
2025-12-04T10:11:57.5209705Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5209800Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5209924Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5210260Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5210324Z graph_break []
2025-12-04T10:11:57.5210809Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml -
2025-12-04T10:11:57.5210912Z =========================== short test summary info ============================
2025-12-04T10:11:57.5212217Z FAILED [0.4892s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5212222Z 
2025-12-04T10:11:57.5212348Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5212872Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5212876Z 
2025-12-04T10:11:57.5213029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5213139Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5213252Z ================== 1 failed, 10 deselected, 2 rerun in 2.92s ===================
2025-12-04T10:11:57.5213315Z Got exit code 1
2025-12-04T10:11:57.5213390Z Retrying single test...
2025-12-04T10:11:57.5213727Z W1204 09:34:43.849000 39976 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5214124Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml
2025-12-04T10:11:57.5214357Z ============================= test session starts ==============================
2025-12-04T10:11:57.5214572Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5214638Z cachedir: .pytest_cache
2025-12-04T10:11:57.5214947Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5215026Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5215090Z configfile: pytest.ini
2025-12-04T10:11:57.5215410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5215544Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5216123Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5216203Z Running 1 items in this shard
2025-12-04T10:11:57.5216207Z 
2025-12-04T10:11:57.5216940Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:34:45.167859200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5216944Z 
2025-12-04T10:11:57.5217401Z [W1204 09:34:54.130987437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5217406Z 
2025-12-04T10:11:57.5217703Z [W1204 09:34:54.131248622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5217706Z 
2025-12-04T10:11:57.5217995Z [W1204 09:34:54.137133722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5218005Z 
2025-12-04T10:11:57.5218291Z [W1204 09:34:54.137729203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5218294Z 
2025-12-04T10:11:57.5218581Z [W1204 09:34:54.137899585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5218584Z 
2025-12-04T10:11:57.5218872Z [W1204 09:34:54.143390450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5218876Z 
2025-12-04T10:11:57.5219162Z [W1204 09:34:54.143923449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5219165Z 
2025-12-04T10:11:57.5219456Z [W1204 09:34:54.144082041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5219462Z 
2025-12-04T10:11:57.5219542Z ('RERUN', {'yellow': True}) [10.9045s] [100%]
2025-12-04T10:11:57.5220279Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:34:55.173806576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5220284Z 
2025-12-04T10:11:57.5220572Z [W1204 09:34:55.174334455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5220575Z 
2025-12-04T10:11:57.5221003Z [W1204 09:34:55.174478037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5221008Z 
2025-12-04T10:11:57.5221301Z [W1204 09:34:55.177397707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5221393Z 
2025-12-04T10:11:57.5221684Z [W1204 09:34:55.177960356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5221687Z 
2025-12-04T10:11:57.5221976Z [W1204 09:34:55.178098729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5221979Z 
2025-12-04T10:11:57.5222271Z [W1204 09:34:55.182648047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5222274Z 
2025-12-04T10:11:57.5222566Z [W1204 09:34:55.183113265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5222570Z 
2025-12-04T10:11:57.5222858Z [W1204 09:34:55.183255018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5222864Z 
2025-12-04T10:11:57.5222946Z ('RERUN', {'yellow': True}) [0.4514s] [100%]
2025-12-04T10:11:57.5223672Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:34:55.622542123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5223676Z 
2025-12-04T10:11:57.5223967Z [W1204 09:34:55.623061772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5223971Z 
2025-12-04T10:11:57.5224259Z [W1204 09:34:55.623200325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5224262Z 
2025-12-04T10:11:57.5224550Z [W1204 09:34:55.626146935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5224561Z 
2025-12-04T10:11:57.5224848Z [W1204 09:34:55.626706394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5224851Z 
2025-12-04T10:11:57.5225138Z [W1204 09:34:55.626851467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5225141Z 
2025-12-04T10:11:57.5225435Z [W1204 09:34:55.631423085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5225438Z 
2025-12-04T10:11:57.5225728Z [W1204 09:34:55.631888363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5225730Z 
2025-12-04T10:11:57.5226022Z [W1204 09:34:55.632026416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5226029Z 
2025-12-04T10:11:57.5226091Z FAILED [0.4485s] [100%]
2025-12-04T10:11:57.5226094Z 
2025-12-04T10:11:57.5226178Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5226478Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5226551Z Traceback (most recent call last):
2025-12-04T10:11:57.5226862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5226928Z     method(*args, **kwargs)
2025-12-04T10:11:57.5227289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5227357Z     method(*args, **kwargs)
2025-12-04T10:11:57.5227646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5227776Z     with policy():
2025-12-04T10:11:57.5228070Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5228134Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5228953Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5228957Z 
2025-12-04T10:11:57.5229084Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5229610Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5229616Z 
2025-12-04T10:11:57.5229775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5229906Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5230001Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5230353Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5230483Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5230541Z graph_break []
2025-12-04T10:11:57.5230662Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5231363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5231436Z   if out == self.unknown_value:
2025-12-04T10:11:57.5231743Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5231820Z Traceback (most recent call last):
2025-12-04T10:11:57.5232118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5232190Z     method(*args, **kwargs)
2025-12-04T10:11:57.5232481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5232548Z     method(*args, **kwargs)
2025-12-04T10:11:57.5232840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5232898Z     with policy():
2025-12-04T10:11:57.5233198Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5233269Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5234097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5234106Z 
2025-12-04T10:11:57.5234229Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5234825Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5234830Z 
2025-12-04T10:11:57.5234991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5235118Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5235280Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5235625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5235749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5235813Z graph_break []
2025-12-04T10:11:57.5235935Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5236633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5236704Z   if out == self.unknown_value:
2025-12-04T10:11:57.5236825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5236921Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5237045Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5237389Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5237454Z graph_break []
2025-12-04T10:11:57.5237537Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5237837Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5237913Z Traceback (most recent call last):
2025-12-04T10:11:57.5238208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5238275Z     method(*args, **kwargs)
2025-12-04T10:11:57.5238568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5238632Z     method(*args, **kwargs)
2025-12-04T10:11:57.5238925Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5238995Z     with policy():
2025-12-04T10:11:57.5239302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5239366Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5240231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5240240Z 
2025-12-04T10:11:57.5240362Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5240885Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5240889Z 
2025-12-04T10:11:57.5241051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5241174Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5241270Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5241712Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5241841Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5241904Z graph_break []
2025-12-04T10:11:57.5242037Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5242812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5242886Z   if out == self.unknown_value:
2025-12-04T10:11:57.5243010Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5243103Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5243225Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5243568Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5243631Z graph_break []
2025-12-04T10:11:57.5243751Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5243848Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5243967Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5244306Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5244366Z graph_break []
2025-12-04T10:11:57.5244851Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml -
2025-12-04T10:11:57.5244950Z =========================== short test summary info ============================
2025-12-04T10:11:57.5246264Z FAILED [0.4485s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5246271Z 
2025-12-04T10:11:57.5246392Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5246918Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5246921Z 
2025-12-04T10:11:57.5247080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5247185Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5247305Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ==================
2025-12-04T10:11:57.5247361Z Got exit code 1
2025-12-04T10:11:57.5247433Z Retrying single test...
2025-12-04T10:11:57.5247696Z W1204 09:35:02.239000 40169 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5248086Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml
2025-12-04T10:11:57.5248180Z ============================= test session starts ==============================
2025-12-04T10:11:57.5248390Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5248459Z cachedir: .pytest_cache
2025-12-04T10:11:57.5248850Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5248932Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5248997Z configfile: pytest.ini
2025-12-04T10:11:57.5249308Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5249507Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5250078Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5250148Z Running 1 items in this shard
2025-12-04T10:11:57.5250155Z 
2025-12-04T10:11:57.5250895Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:35:03.538075638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5250899Z 
2025-12-04T10:11:57.5251194Z [W1204 09:35:12.307782340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5251200Z 
2025-12-04T10:11:57.5251494Z [W1204 09:35:12.308038615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5251497Z 
2025-12-04T10:11:57.5251788Z [W1204 09:35:12.313855174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5251791Z 
2025-12-04T10:11:57.5252084Z [W1204 09:35:12.314457804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5252087Z 
2025-12-04T10:11:57.5252387Z [W1204 09:35:12.314628887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5252390Z 
2025-12-04T10:11:57.5252683Z [W1204 09:35:12.320061900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5252689Z 
2025-12-04T10:11:57.5252979Z [W1204 09:35:12.320602359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5252982Z 
2025-12-04T10:11:57.5253269Z [W1204 09:35:12.320773092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5253272Z 
2025-12-04T10:11:57.5253351Z ('RERUN', {'yellow': True}) [10.6977s] [100%]
2025-12-04T10:11:57.5254081Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:35:13.354008733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5254091Z 
2025-12-04T10:11:57.5254380Z [W1204 09:35:13.354529081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5254386Z 
2025-12-04T10:11:57.5254675Z [W1204 09:35:13.354673814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5254678Z 
2025-12-04T10:11:57.5254974Z [W1204 09:35:13.357532083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5254977Z 
2025-12-04T10:11:57.5255266Z [W1204 09:35:13.358082982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5255269Z 
2025-12-04T10:11:57.5255633Z [W1204 09:35:13.358221654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5255637Z 
2025-12-04T10:11:57.5255925Z [W1204 09:35:13.362670880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5255992Z 
2025-12-04T10:11:57.5256283Z [W1204 09:35:13.363124668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5256286Z 
2025-12-04T10:11:57.5256577Z [W1204 09:35:13.363263180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5256581Z 
2025-12-04T10:11:57.5256672Z ('RERUN', {'yellow': True}) [0.4548s] [100%]
2025-12-04T10:11:57.5257400Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:35:13.804346209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5257405Z 
2025-12-04T10:11:57.5257696Z [W1204 09:35:13.804866157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5257709Z 
2025-12-04T10:11:57.5257998Z [W1204 09:35:13.805005810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5258001Z 
2025-12-04T10:11:57.5258287Z [W1204 09:35:13.807848939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5258290Z 
2025-12-04T10:11:57.5258581Z [W1204 09:35:13.808398578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5258584Z 
2025-12-04T10:11:57.5258876Z [W1204 09:35:13.808537390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5258879Z 
2025-12-04T10:11:57.5259170Z [W1204 09:35:13.812998707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5259173Z 
2025-12-04T10:11:57.5259465Z [W1204 09:35:13.813454804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5259467Z 
2025-12-04T10:11:57.5259760Z [W1204 09:35:13.813592766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5259763Z 
2025-12-04T10:11:57.5259823Z FAILED [0.4494s] [100%]
2025-12-04T10:11:57.5259826Z 
2025-12-04T10:11:57.5259908Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5260209Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5260285Z Traceback (most recent call last):
2025-12-04T10:11:57.5260593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5260659Z     method(*args, **kwargs)
2025-12-04T10:11:57.5260960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5261032Z     method(*args, **kwargs)
2025-12-04T10:11:57.5261320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5261381Z     with policy():
2025-12-04T10:11:57.5261679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5261745Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5262639Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5262644Z 
2025-12-04T10:11:57.5262838Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5263367Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5263371Z 
2025-12-04T10:11:57.5263530Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5263655Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5263756Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5264108Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5264240Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5264300Z graph_break []
2025-12-04T10:11:57.5264422Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5265117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5265187Z   if out == self.unknown_value:
2025-12-04T10:11:57.5265483Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5265561Z Traceback (most recent call last):
2025-12-04T10:11:57.5265864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5265932Z     method(*args, **kwargs)
2025-12-04T10:11:57.5266226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5266290Z     method(*args, **kwargs)
2025-12-04T10:11:57.5266585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5266644Z     with policy():
2025-12-04T10:11:57.5266944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5267008Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5267826Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5267830Z 
2025-12-04T10:11:57.5267957Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5268478Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5268484Z 
2025-12-04T10:11:57.5268647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5268767Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5268857Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5269210Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5269405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5269472Z graph_break []
2025-12-04T10:11:57.5269604Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5270293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5270451Z   if out == self.unknown_value:
2025-12-04T10:11:57.5270573Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5270667Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5270790Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5271131Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5271197Z graph_break []
2025-12-04T10:11:57.5271280Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5271573Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5271652Z Traceback (most recent call last):
2025-12-04T10:11:57.5271953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5272020Z     method(*args, **kwargs)
2025-12-04T10:11:57.5272310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5272372Z     method(*args, **kwargs)
2025-12-04T10:11:57.5272662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5272721Z     with policy():
2025-12-04T10:11:57.5273014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5273084Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5273900Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5273906Z 
2025-12-04T10:11:57.5274033Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5274553Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5274557Z 
2025-12-04T10:11:57.5274719Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5274842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5274942Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5275290Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5275414Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5275476Z graph_break []
2025-12-04T10:11:57.5275597Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5276283Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5276357Z   if out == self.unknown_value:
2025-12-04T10:11:57.5276547Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5276639Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5276765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5277168Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5277233Z graph_break []
2025-12-04T10:11:57.5277356Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5277443Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5277569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5277904Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5277968Z graph_break []
2025-12-04T10:11:57.5278455Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml -
2025-12-04T10:11:57.5278554Z =========================== short test summary info ============================
2025-12-04T10:11:57.5279854Z FAILED [0.4494s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5279858Z 
2025-12-04T10:11:57.5280019Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5280550Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5280554Z 
2025-12-04T10:11:57.5280711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5280816Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5280930Z ================== 1 failed, 57 deselected, 2 rerun in 11.62s ==================
2025-12-04T10:11:57.5280989Z Got exit code 1
2025-12-04T10:11:57.5281475Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5281723Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5281993Z W1204 09:35:20.448000 40362 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5282381Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml
2025-12-04T10:11:57.5282476Z ============================= test session starts ==============================
2025-12-04T10:11:57.5282694Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5282761Z cachedir: .pytest_cache
2025-12-04T10:11:57.5283065Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5283151Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5283218Z configfile: pytest.ini
2025-12-04T10:11:57.5283542Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5283738Z collecting ... collected 58 items / 11 deselected / 47 selected
2025-12-04T10:11:57.5283827Z stepcurrent: skipping 11 already run items.
2025-12-04T10:11:57.5283902Z Running 47 items in this shard
2025-12-04T10:11:57.5283906Z 
2025-12-04T10:11:57.5284398Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9262s] [  2%]
2025-12-04T10:11:57.5284955Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5400s] [  2%]
2025-12-04T10:11:57.5285395Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.5357s] [  2%]
2025-12-04T10:11:57.5285399Z 
2025-12-04T10:11:57.5285482Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5285779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5285852Z Traceback (most recent call last):
2025-12-04T10:11:57.5286168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5286232Z     method(*args, **kwargs)
2025-12-04T10:11:57.5286527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5286594Z     method(*args, **kwargs)
2025-12-04T10:11:57.5286883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5286947Z     with policy():
2025-12-04T10:11:57.5287243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5287310Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5288108Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5288115Z 
2025-12-04T10:11:57.5288238Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5288758Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5288761Z 
2025-12-04T10:11:57.5288917Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5289041Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5289139Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5289685Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5289816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5289874Z graph_break []
2025-12-04T10:11:57.5290165Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5290244Z Traceback (most recent call last):
2025-12-04T10:11:57.5290542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5290609Z     method(*args, **kwargs)
2025-12-04T10:11:57.5290972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5291036Z     method(*args, **kwargs)
2025-12-04T10:11:57.5291329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5291452Z     with policy():
2025-12-04T10:11:57.5291750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5291819Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5292632Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5292637Z 
2025-12-04T10:11:57.5292761Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5293278Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5293282Z 
2025-12-04T10:11:57.5293443Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5293566Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5293659Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5294202Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5294328Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5294391Z graph_break []
2025-12-04T10:11:57.5294516Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5294605Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5294731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5295272Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5295333Z graph_break []
2025-12-04T10:11:57.5295421Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5295710Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5295789Z Traceback (most recent call last):
2025-12-04T10:11:57.5296101Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5296165Z     method(*args, **kwargs)
2025-12-04T10:11:57.5296468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5296531Z     method(*args, **kwargs)
2025-12-04T10:11:57.5296824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5296889Z     with policy():
2025-12-04T10:11:57.5297196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5297266Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5298154Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5298159Z 
2025-12-04T10:11:57.5298288Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5298809Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5298894Z 
2025-12-04T10:11:57.5299048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5299177Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5299268Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5299811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5299937Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5299995Z graph_break []
2025-12-04T10:11:57.5300124Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5300211Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5300334Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5300878Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5300939Z graph_break []
2025-12-04T10:11:57.5301062Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5301148Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5301267Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5301807Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5301868Z graph_break []
2025-12-04T10:11:57.5302363Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml -
2025-12-04T10:11:57.5302461Z =========================== short test summary info ============================
2025-12-04T10:11:57.5303748Z FAILED [0.5357s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5303761Z 
2025-12-04T10:11:57.5303881Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5304407Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5304415Z 
2025-12-04T10:11:57.5304573Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5304678Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5304798Z ================== 1 failed, 11 deselected, 2 rerun in 3.03s ===================
2025-12-04T10:11:57.5304857Z Got exit code 1
2025-12-04T10:11:57.5304921Z Retrying single test...
2025-12-04T10:11:57.5305258Z W1204 09:35:30.094000 40544 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5305646Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml
2025-12-04T10:11:57.5305808Z ============================= test session starts ==============================
2025-12-04T10:11:57.5306014Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5306081Z cachedir: .pytest_cache
2025-12-04T10:11:57.5306389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5306465Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5306530Z configfile: pytest.ini
2025-12-04T10:11:57.5306856Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5306987Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5307565Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5307637Z Running 1 items in this shard
2025-12-04T10:11:57.5307641Z 
2025-12-04T10:11:57.5308367Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:31.685016588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5308376Z 
2025-12-04T10:11:57.5308675Z [W1204 09:35:40.600890771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5308678Z 
2025-12-04T10:11:57.5308972Z [W1204 09:35:40.601159026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5308975Z 
2025-12-04T10:11:57.5309264Z [W1204 09:35:40.607469144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5309270Z 
2025-12-04T10:11:57.5309557Z [W1204 09:35:40.608093154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5309560Z 
2025-12-04T10:11:57.5309850Z [W1204 09:35:40.608280018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5309854Z 
2025-12-04T10:11:57.5310144Z [W1204 09:35:40.613922034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5310148Z 
2025-12-04T10:11:57.5310443Z [W1204 09:35:40.614473843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5310446Z 
2025-12-04T10:11:57.5310734Z [W1204 09:35:40.614628826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5310740Z 
2025-12-04T10:11:57.5310822Z ('RERUN', {'yellow': True}) [10.8677s] [100%]
2025-12-04T10:11:57.5311543Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:41.422313061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5311547Z 
2025-12-04T10:11:57.5311836Z [W1204 09:35:41.422880661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5311845Z 
2025-12-04T10:11:57.5312200Z [W1204 09:35:41.423019903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5312204Z 
2025-12-04T10:11:57.5312491Z [W1204 09:35:41.426065886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5312558Z 
2025-12-04T10:11:57.5312852Z [W1204 09:35:41.426522913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5312855Z 
2025-12-04T10:11:57.5313144Z [W1204 09:35:41.426660676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5313147Z 
2025-12-04T10:11:57.5313443Z [W1204 09:35:41.431423178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5313447Z 
2025-12-04T10:11:57.5313740Z [W1204 09:35:41.431890946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5313743Z 
2025-12-04T10:11:57.5314037Z [W1204 09:35:41.432027898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5314043Z 
2025-12-04T10:11:57.5314121Z ('RERUN', {'yellow': True}) [0.5030s] [100%]
2025-12-04T10:11:57.5314841Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:41.924569509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5314848Z 
2025-12-04T10:11:57.5315147Z [W1204 09:35:41.925127528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5315151Z 
2025-12-04T10:11:57.5315441Z [W1204 09:35:41.925265421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5315444Z 
2025-12-04T10:11:57.5315738Z [W1204 09:35:41.928304833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5315744Z 
2025-12-04T10:11:57.5316032Z [W1204 09:35:41.928764361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5316035Z 
2025-12-04T10:11:57.5316326Z [W1204 09:35:41.928900933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5316330Z 
2025-12-04T10:11:57.5316614Z [W1204 09:35:41.933621874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5316617Z 
2025-12-04T10:11:57.5316915Z [W1204 09:35:41.934096252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5316919Z 
2025-12-04T10:11:57.5317357Z [W1204 09:35:41.934235395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5317361Z 
2025-12-04T10:11:57.5317429Z FAILED [0.4964s] [100%]
2025-12-04T10:11:57.5317432Z 
2025-12-04T10:11:57.5317514Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5317806Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5317887Z Traceback (most recent call last):
2025-12-04T10:11:57.5318194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5318260Z     method(*args, **kwargs)
2025-12-04T10:11:57.5318679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5318749Z     method(*args, **kwargs)
2025-12-04T10:11:57.5319044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5319103Z     with policy():
2025-12-04T10:11:57.5319488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5319560Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5320399Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5320404Z 
2025-12-04T10:11:57.5320532Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5321051Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5321055Z 
2025-12-04T10:11:57.5321217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5321348Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5321441Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5321990Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5322120Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5322179Z graph_break []
2025-12-04T10:11:57.5322319Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5323021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5323097Z   if out == self.unknown_value:
2025-12-04T10:11:57.5323389Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5323462Z Traceback (most recent call last):
2025-12-04T10:11:57.5323764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5323828Z     method(*args, **kwargs)
2025-12-04T10:11:57.5324126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5324189Z     method(*args, **kwargs)
2025-12-04T10:11:57.5324481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5324545Z     with policy():
2025-12-04T10:11:57.5324842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5324910Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5325741Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5325745Z 
2025-12-04T10:11:57.5325869Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5326487Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5326491Z 
2025-12-04T10:11:57.5326650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5326779Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5327015Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5327555Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5327696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5327755Z graph_break []
2025-12-04T10:11:57.5327884Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5328570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5328641Z   if out == self.unknown_value:
2025-12-04T10:11:57.5328769Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5328857Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5328978Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5329520Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5329581Z graph_break []
2025-12-04T10:11:57.5329667Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5329958Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5330035Z Traceback (most recent call last):
2025-12-04T10:11:57.5330340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5330406Z     method(*args, **kwargs)
2025-12-04T10:11:57.5330700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5330761Z     method(*args, **kwargs)
2025-12-04T10:11:57.5331055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5331119Z     with policy():
2025-12-04T10:11:57.5331410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5331476Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5332305Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5332311Z 
2025-12-04T10:11:57.5332435Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5332953Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5332957Z 
2025-12-04T10:11:57.5333110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5333237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5333326Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5333937Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5334130Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5334188Z graph_break []
2025-12-04T10:11:57.5334314Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5335002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5335070Z   if out == self.unknown_value:
2025-12-04T10:11:57.5335194Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5335287Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5335412Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5335952Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5336012Z graph_break []
2025-12-04T10:11:57.5336138Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5336225Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5336345Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5337082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5337149Z graph_break []
2025-12-04T10:11:57.5337650Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml -
2025-12-04T10:11:57.5337750Z =========================== short test summary info ============================
2025-12-04T10:11:57.5339056Z FAILED [0.4964s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5339061Z 
2025-12-04T10:11:57.5339189Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5339715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5339723Z 
2025-12-04T10:11:57.5339888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5339993Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5340115Z ================== 1 failed, 57 deselected, 2 rerun in 11.89s ==================
2025-12-04T10:11:57.5340175Z Got exit code 1
2025-12-04T10:11:57.5340240Z Retrying single test...
2025-12-04T10:11:57.5340509Z W1204 09:35:48.533000 40731 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5340898Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml
2025-12-04T10:11:57.5341081Z ============================= test session starts ==============================
2025-12-04T10:11:57.5341291Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5341357Z cachedir: .pytest_cache
2025-12-04T10:11:57.5341735Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5341811Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5341877Z configfile: pytest.ini
2025-12-04T10:11:57.5342196Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5342323Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5342897Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5342969Z Running 1 items in this shard
2025-12-04T10:11:57.5342973Z 
2025-12-04T10:11:57.5343704Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:50.120608693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5343711Z 
2025-12-04T10:11:57.5344007Z [W1204 09:35:59.127030871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5344011Z 
2025-12-04T10:11:57.5344302Z [W1204 09:35:59.127297296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5344309Z 
2025-12-04T10:11:57.5344601Z [W1204 09:35:59.133262728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5344605Z 
2025-12-04T10:11:57.5344891Z [W1204 09:35:59.133886979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5344894Z 
2025-12-04T10:11:57.5345201Z [W1204 09:35:59.134068632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5345205Z 
2025-12-04T10:11:57.5345495Z [W1204 09:35:59.139482454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5345498Z 
2025-12-04T10:11:57.5345792Z [W1204 09:35:59.140031724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5345795Z 
2025-12-04T10:11:57.5346084Z [W1204 09:35:59.140198507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5346090Z 
2025-12-04T10:11:57.5346174Z ('RERUN', {'yellow': True}) [10.9491s] [100%]
2025-12-04T10:11:57.5346897Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:35:59.939979529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5346904Z 
2025-12-04T10:11:57.5347196Z [W1204 09:36:00.940560889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5347199Z 
2025-12-04T10:11:57.5347489Z [W1204 09:36:00.940709051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5347492Z 
2025-12-04T10:11:57.5347859Z [W1204 09:36:00.943701593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5347864Z 
2025-12-04T10:11:57.5348166Z [W1204 09:36:00.944156361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5348169Z 
2025-12-04T10:11:57.5348458Z [W1204 09:36:00.944320604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5348525Z 
2025-12-04T10:11:57.5348820Z [W1204 09:36:00.948920463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5348823Z 
2025-12-04T10:11:57.5349113Z [W1204 09:36:00.949373111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5349116Z 
2025-12-04T10:11:57.5349409Z [W1204 09:36:00.949508043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5349415Z 
2025-12-04T10:11:57.5349493Z ('RERUN', {'yellow': True}) [0.5006s] [100%]
2025-12-04T10:11:57.5350217Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:36:00.440240630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5350223Z 
2025-12-04T10:11:57.5350513Z [W1204 09:36:00.440785830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5350517Z 
2025-12-04T10:11:57.5350806Z [W1204 09:36:00.440922802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5350815Z 
2025-12-04T10:11:57.5351104Z [W1204 09:36:00.443779301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5351111Z 
2025-12-04T10:11:57.5351398Z [W1204 09:36:00.444222179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5351402Z 
2025-12-04T10:11:57.5351695Z [W1204 09:36:00.444366281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5351700Z 
2025-12-04T10:11:57.5351987Z [W1204 09:36:00.448867709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5351990Z 
2025-12-04T10:11:57.5352289Z [W1204 09:36:00.449320157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5352292Z 
2025-12-04T10:11:57.5352578Z [W1204 09:36:00.449453859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5352581Z 
2025-12-04T10:11:57.5352649Z FAILED [0.4991s] [100%]
2025-12-04T10:11:57.5352652Z 
2025-12-04T10:11:57.5352740Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5353034Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5353120Z Traceback (most recent call last):
2025-12-04T10:11:57.5353426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5353496Z     method(*args, **kwargs)
2025-12-04T10:11:57.5353791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5353853Z     method(*args, **kwargs)
2025-12-04T10:11:57.5354148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5354210Z     with policy():
2025-12-04T10:11:57.5354600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5354676Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5355474Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5355544Z 
2025-12-04T10:11:57.5355677Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5356198Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5356202Z 
2025-12-04T10:11:57.5356370Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5356496Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5356590Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5357137Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5357269Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5357334Z graph_break []
2025-12-04T10:11:57.5357454Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5358153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5358227Z   if out == self.unknown_value:
2025-12-04T10:11:57.5358520Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5358596Z Traceback (most recent call last):
2025-12-04T10:11:57.5358895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5358956Z     method(*args, **kwargs)
2025-12-04T10:11:57.5359250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5359312Z     method(*args, **kwargs)
2025-12-04T10:11:57.5359601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5359665Z     with policy():
2025-12-04T10:11:57.5360041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5360120Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5360932Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5360938Z 
2025-12-04T10:11:57.5361065Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5361586Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5361590Z 
2025-12-04T10:11:57.5361747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5361948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5362045Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5362586Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5362782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5362841Z graph_break []
2025-12-04T10:11:57.5362967Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5363656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5363727Z   if out == self.unknown_value:
2025-12-04T10:11:57.5363856Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5363947Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5364084Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5364629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5364689Z graph_break []
2025-12-04T10:11:57.5364776Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5365069Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5365146Z Traceback (most recent call last):
2025-12-04T10:11:57.5365449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5365513Z     method(*args, **kwargs)
2025-12-04T10:11:57.5365810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5365875Z     method(*args, **kwargs)
2025-12-04T10:11:57.5366164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5366229Z     with policy():
2025-12-04T10:11:57.5366524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5366594Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5367412Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5367415Z 
2025-12-04T10:11:57.5367540Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5368061Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5368067Z 
2025-12-04T10:11:57.5368229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5368357Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5368449Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5368992Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5369193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5369259Z graph_break []
2025-12-04T10:11:57.5369386Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5370138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5370209Z   if out == self.unknown_value:
2025-12-04T10:11:57.5370337Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5370429Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5370554Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5371095Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5371153Z graph_break []
2025-12-04T10:11:57.5371284Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5371375Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5371500Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5372041Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5372099Z graph_break []
2025-12-04T10:11:57.5372603Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml -
2025-12-04T10:11:57.5372704Z =========================== short test summary info ============================
2025-12-04T10:11:57.5374003Z FAILED [0.4991s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5374010Z 
2025-12-04T10:11:57.5374132Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5374655Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5374658Z 
2025-12-04T10:11:57.5374818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5374923Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5375045Z ================== 1 failed, 57 deselected, 2 rerun in 11.97s ==================
2025-12-04T10:11:57.5375106Z Got exit code 1
2025-12-04T10:11:57.5375583Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5375829Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5376093Z W1204 09:36:07.052000 40918 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5376559Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml
2025-12-04T10:11:57.5376656Z ============================= test session starts ==============================
2025-12-04T10:11:57.5376869Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5377022Z cachedir: .pytest_cache
2025-12-04T10:11:57.5377334Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5377420Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5377486Z configfile: pytest.ini
2025-12-04T10:11:57.5377815Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5377950Z collecting ... collected 58 items / 12 deselected / 46 selected
2025-12-04T10:11:57.5378040Z stepcurrent: skipping 12 already run items.
2025-12-04T10:11:57.5378120Z Running 46 items in this shard
2025-12-04T10:11:57.5378124Z 
2025-12-04T10:11:57.5378624Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9332s] [  2%]
2025-12-04T10:11:57.5379109Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5371s] [  2%]
2025-12-04T10:11:57.5379563Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5333s] [  2%]
2025-12-04T10:11:57.5379567Z 
2025-12-04T10:11:57.5379652Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5379947Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5380021Z Traceback (most recent call last):
2025-12-04T10:11:57.5380327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5380399Z     method(*args, **kwargs)
2025-12-04T10:11:57.5380695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5380765Z     method(*args, **kwargs)
2025-12-04T10:11:57.5381055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5381117Z     with policy():
2025-12-04T10:11:57.5381426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5381491Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5382301Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5382305Z 
2025-12-04T10:11:57.5382433Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5382951Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5382955Z 
2025-12-04T10:11:57.5383121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5383249Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5383345Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5383960Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5384092Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5384158Z graph_break []
2025-12-04T10:11:57.5384448Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5384596Z Traceback (most recent call last):
2025-12-04T10:11:57.5384904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5384968Z     method(*args, **kwargs)
2025-12-04T10:11:57.5385263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5385324Z     method(*args, **kwargs)
2025-12-04T10:11:57.5385617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5385681Z     with policy():
2025-12-04T10:11:57.5385977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5386048Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5386861Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5386866Z 
2025-12-04T10:11:57.5386992Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5387509Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5387512Z 
2025-12-04T10:11:57.5387682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5387813Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5387905Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5388454Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5388582Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5388640Z graph_break []
2025-12-04T10:11:57.5388768Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5388857Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5388977Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5389522Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5389583Z graph_break []
2025-12-04T10:11:57.5389672Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5389961Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5390032Z Traceback (most recent call last):
2025-12-04T10:11:57.5390349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5390414Z     method(*args, **kwargs)
2025-12-04T10:11:57.5390715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5390851Z     method(*args, **kwargs)
2025-12-04T10:11:57.5391144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5391208Z     with policy():
2025-12-04T10:11:57.5391504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5391635Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5392455Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5392460Z 
2025-12-04T10:11:57.5392583Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5393111Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5393114Z 
2025-12-04T10:11:57.5393270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5393401Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5393493Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5394033Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5394165Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5394223Z graph_break []
2025-12-04T10:11:57.5394346Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5394443Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5394574Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5395120Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5395183Z graph_break []
2025-12-04T10:11:57.5395304Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5395401Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5395522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5396064Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5396125Z graph_break []
2025-12-04T10:11:57.5396621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml -
2025-12-04T10:11:57.5396730Z =========================== short test summary info ============================
2025-12-04T10:11:57.5398026Z FAILED [0.5333s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5398031Z 
2025-12-04T10:11:57.5398172Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5398766Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5398770Z 
2025-12-04T10:11:57.5398996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5399101Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5399217Z ================== 1 failed, 12 deselected, 2 rerun in 3.03s ===================
2025-12-04T10:11:57.5399285Z Got exit code 1
2025-12-04T10:11:57.5399350Z Retrying single test...
2025-12-04T10:11:57.5399618Z W1204 09:36:16.688000 41100 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5400049Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml
2025-12-04T10:11:57.5400157Z ============================= test session starts ==============================
2025-12-04T10:11:57.5400377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5400446Z cachedir: .pytest_cache
2025-12-04T10:11:57.5400759Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5400843Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5400908Z configfile: pytest.ini
2025-12-04T10:11:57.5401232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5401363Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5401935Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5402019Z Running 1 items in this shard
2025-12-04T10:11:57.5402024Z 
2025-12-04T10:11:57.5402757Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:18.282281481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5402764Z 
2025-12-04T10:11:57.5403071Z [W1204 09:36:27.257591219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5403075Z 
2025-12-04T10:11:57.5403368Z [W1204 09:36:27.257856694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5403372Z 
2025-12-04T10:11:57.5403671Z [W1204 09:36:27.264629780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5403675Z 
2025-12-04T10:11:57.5403966Z [W1204 09:36:27.265245591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5403970Z 
2025-12-04T10:11:57.5404271Z [W1204 09:36:27.265431384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5404274Z 
2025-12-04T10:11:57.5404565Z [W1204 09:36:27.270808317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5404568Z 
2025-12-04T10:11:57.5404855Z [W1204 09:36:27.271338486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5404862Z 
2025-12-04T10:11:57.5405224Z [W1204 09:36:27.271494948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5405228Z 
2025-12-04T10:11:57.5405314Z ('RERUN', {'yellow': True}) [10.9230s] [100%]
2025-12-04T10:11:57.5406044Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:28.072147691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5406132Z 
2025-12-04T10:11:57.5406422Z [W1204 09:36:28.072703111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5406426Z 
2025-12-04T10:11:57.5406723Z [W1204 09:36:28.072844223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5406726Z 
2025-12-04T10:11:57.5407018Z [W1204 09:36:28.075728663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5407021Z 
2025-12-04T10:11:57.5407314Z [W1204 09:36:28.076175360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5407318Z 
2025-12-04T10:11:57.5407610Z [W1204 09:36:28.076321353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5407613Z 
2025-12-04T10:11:57.5407910Z [W1204 09:36:28.080888851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5407913Z 
2025-12-04T10:11:57.5408210Z [W1204 09:36:28.081340929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5408213Z 
2025-12-04T10:11:57.5408500Z [W1204 09:36:28.081477181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5408506Z 
2025-12-04T10:11:57.5408602Z ('RERUN', {'yellow': True}) [0.5020s] [100%]
2025-12-04T10:11:57.5409323Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:28.572773688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5409329Z 
2025-12-04T10:11:57.5409629Z [W1204 09:36:28.573326638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5409632Z 
2025-12-04T10:11:57.5409918Z [W1204 09:36:28.573468460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5409921Z 
2025-12-04T10:11:57.5410214Z [W1204 09:36:28.576310819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5410220Z 
2025-12-04T10:11:57.5410507Z [W1204 09:36:28.576754366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5410510Z 
2025-12-04T10:11:57.5410804Z [W1204 09:36:28.576889969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5410809Z 
2025-12-04T10:11:57.5411097Z [W1204 09:36:28.581345515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5411100Z 
2025-12-04T10:11:57.5411386Z [W1204 09:36:28.581800033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5411394Z 
2025-12-04T10:11:57.5411680Z [W1204 09:36:28.581935845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5411683Z 
2025-12-04T10:11:57.5411817Z FAILED [0.4986s] [100%]
2025-12-04T10:11:57.5411820Z 
2025-12-04T10:11:57.5411913Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5412211Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5412359Z Traceback (most recent call last):
2025-12-04T10:11:57.5412668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5412734Z     method(*args, **kwargs)
2025-12-04T10:11:57.5413033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5413100Z     method(*args, **kwargs)
2025-12-04T10:11:57.5413391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5413459Z     with policy():
2025-12-04T10:11:57.5413754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5413826Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5414619Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5414626Z 
2025-12-04T10:11:57.5414758Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5415286Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5415289Z 
2025-12-04T10:11:57.5415455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5415588Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5415685Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5416232Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5416375Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5416435Z graph_break []
2025-12-04T10:11:57.5416566Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5417438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5417515Z   if out == self.unknown_value:
2025-12-04T10:11:57.5417814Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5417897Z Traceback (most recent call last):
2025-12-04T10:11:57.5418205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5418268Z     method(*args, **kwargs)
2025-12-04T10:11:57.5418560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5418628Z     method(*args, **kwargs)
2025-12-04T10:11:57.5418919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5418979Z     with policy():
2025-12-04T10:11:57.5419395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5419464Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5420283Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5420375Z 
2025-12-04T10:11:57.5420502Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5421026Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5421030Z 
2025-12-04T10:11:57.5421191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5421318Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5421418Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5421962Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5422099Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5422160Z graph_break []
2025-12-04T10:11:57.5422281Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5422981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5423052Z   if out == self.unknown_value:
2025-12-04T10:11:57.5423183Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5423274Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5423398Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5423950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5424011Z graph_break []
2025-12-04T10:11:57.5424093Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5424386Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5424459Z Traceback (most recent call last):
2025-12-04T10:11:57.5424769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5424833Z     method(*args, **kwargs)
2025-12-04T10:11:57.5425129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5425201Z     method(*args, **kwargs)
2025-12-04T10:11:57.5425490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5425561Z     with policy():
2025-12-04T10:11:57.5425856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5425920Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5426811Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5426815Z 
2025-12-04T10:11:57.5426939Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5434555Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5434675Z 
2025-12-04T10:11:57.5434863Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5435008Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5435111Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5435667Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5435806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5435867Z graph_break []
2025-12-04T10:11:57.5436008Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5436720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5436795Z   if out == self.unknown_value:
2025-12-04T10:11:57.5436931Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5437031Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5437163Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5437714Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5437774Z graph_break []
2025-12-04T10:11:57.5437903Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5437997Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5438118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5438659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5438716Z graph_break []
2025-12-04T10:11:57.5439223Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml -
2025-12-04T10:11:57.5439328Z =========================== short test summary info ============================
2025-12-04T10:11:57.5440707Z FAILED [0.4986s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5440716Z 
2025-12-04T10:11:57.5440852Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5441392Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5441396Z 
2025-12-04T10:11:57.5441653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5441765Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5441890Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.5442026Z Got exit code 1
2025-12-04T10:11:57.5442093Z Retrying single test...
2025-12-04T10:11:57.5442369Z W1204 09:36:35.239000 41287 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5442758Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml
2025-12-04T10:11:57.5442865Z ============================= test session starts ==============================
2025-12-04T10:11:57.5443077Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5443146Z cachedir: .pytest_cache
2025-12-04T10:11:57.5443464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5443542Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5443609Z configfile: pytest.ini
2025-12-04T10:11:57.5443932Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5444065Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5444639Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5444720Z Running 1 items in this shard
2025-12-04T10:11:57.5444724Z 
2025-12-04T10:11:57.5445470Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:36.830662045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5445474Z 
2025-12-04T10:11:57.5445781Z [W1204 09:36:46.951336829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5445787Z 
2025-12-04T10:11:57.5446078Z [W1204 09:36:46.951595803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5446087Z 
2025-12-04T10:11:57.5446375Z [W1204 09:36:46.957503694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5446379Z 
2025-12-04T10:11:57.5446666Z [W1204 09:36:46.958091484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5446669Z 
2025-12-04T10:11:57.5446964Z [W1204 09:36:46.958275557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5446967Z 
2025-12-04T10:11:57.5447255Z [W1204 09:36:46.963729201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5447261Z 
2025-12-04T10:11:57.5447555Z [W1204 09:36:46.964270250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5447558Z 
2025-12-04T10:11:57.5447847Z [W1204 09:36:46.964443503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5447850Z 
2025-12-04T10:11:57.5447935Z ('RERUN', {'yellow': True}) [11.0699s] [100%]
2025-12-04T10:11:57.5448804Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:46.770371191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5448809Z 
2025-12-04T10:11:57.5449105Z [W1204 09:36:46.770925301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5449173Z 
2025-12-04T10:11:57.5449461Z [W1204 09:36:46.771068763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5449464Z 
2025-12-04T10:11:57.5449752Z [W1204 09:36:46.774001864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5449759Z 
2025-12-04T10:11:57.5450046Z [W1204 09:36:46.774452242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5450050Z 
2025-12-04T10:11:57.5450338Z [W1204 09:36:46.774589124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5450342Z 
2025-12-04T10:11:57.5450636Z [W1204 09:36:46.779069492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5450642Z 
2025-12-04T10:11:57.5450927Z [W1204 09:36:46.779521619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5450931Z 
2025-12-04T10:11:57.5451225Z [W1204 09:36:46.779657171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5451228Z 
2025-12-04T10:11:57.5451306Z ('RERUN', {'yellow': True}) [0.5027s] [100%]
2025-12-04T10:11:57.5452031Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:36:47.271747383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5452035Z 
2025-12-04T10:11:57.5452323Z [W1204 09:36:47.272304953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5452329Z 
2025-12-04T10:11:57.5452621Z [W1204 09:36:47.272444625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5452625Z 
2025-12-04T10:11:57.5452911Z [W1204 09:36:47.275324004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5452914Z 
2025-12-04T10:11:57.5453204Z [W1204 09:36:47.275774852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5453207Z 
2025-12-04T10:11:57.5453503Z [W1204 09:36:47.275913354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5453506Z 
2025-12-04T10:11:57.5453795Z [W1204 09:36:47.280457402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5453801Z 
2025-12-04T10:11:57.5454095Z [W1204 09:36:47.280923550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5454098Z 
2025-12-04T10:11:57.5454384Z [W1204 09:36:47.281059782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5454387Z 
2025-12-04T10:11:57.5454453Z FAILED [0.4987s] [100%]
2025-12-04T10:11:57.5454456Z 
2025-12-04T10:11:57.5454540Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5454907Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5454991Z Traceback (most recent call last):
2025-12-04T10:11:57.5455304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5455772Z     method(*args, **kwargs)
2025-12-04T10:11:57.5456072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5456139Z     method(*args, **kwargs)
2025-12-04T10:11:57.5456437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5456496Z     with policy():
2025-12-04T10:11:57.5456798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5456868Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5457671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5457678Z 
2025-12-04T10:11:57.5457816Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5458336Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5458340Z 
2025-12-04T10:11:57.5458504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5458635Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5458735Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5459289Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5459421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5459488Z graph_break []
2025-12-04T10:11:57.5459613Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5460310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5460386Z   if out == self.unknown_value:
2025-12-04T10:11:57.5460681Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5460763Z Traceback (most recent call last):
2025-12-04T10:11:57.5461063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5461129Z     method(*args, **kwargs)
2025-12-04T10:11:57.5461430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5461494Z     method(*args, **kwargs)
2025-12-04T10:11:57.5461785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5461850Z     with policy():
2025-12-04T10:11:57.5462157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5462232Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5463114Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5463119Z 
2025-12-04T10:11:57.5463247Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5463835Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5463839Z 
2025-12-04T10:11:57.5463997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5464126Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5464220Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5464764Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5464895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5464956Z graph_break []
2025-12-04T10:11:57.5465086Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5465776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5465858Z   if out == self.unknown_value:
2025-12-04T10:11:57.5465991Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5466083Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5466211Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5466755Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5466816Z graph_break []
2025-12-04T10:11:57.5466905Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5467192Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5467271Z Traceback (most recent call last):
2025-12-04T10:11:57.5467568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5467630Z     method(*args, **kwargs)
2025-12-04T10:11:57.5467928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5467997Z     method(*args, **kwargs)
2025-12-04T10:11:57.5468286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5468353Z     with policy():
2025-12-04T10:11:57.5468650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5468719Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5469529Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5469534Z 
2025-12-04T10:11:57.5469663Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5470279Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5470283Z 
2025-12-04T10:11:57.5470441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5470633Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5470724Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5471282Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5471408Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5471466Z graph_break []
2025-12-04T10:11:57.5471593Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5472294Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5472368Z   if out == self.unknown_value:
2025-12-04T10:11:57.5472493Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5472582Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5472713Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5473254Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5473311Z graph_break []
2025-12-04T10:11:57.5473439Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5473527Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5473655Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5474191Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5474253Z graph_break []
2025-12-04T10:11:57.5474745Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml -
2025-12-04T10:11:57.5474846Z =========================== short test summary info ============================
2025-12-04T10:11:57.5476137Z FAILED [0.4987s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5476144Z 
2025-12-04T10:11:57.5476273Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5476795Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5476798Z 
2025-12-04T10:11:57.5476956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5477059Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5477254Z ================== 1 failed, 57 deselected, 2 rerun in 12.10s ==================
2025-12-04T10:11:57.5477314Z Got exit code 1
2025-12-04T10:11:57.5477802Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5478116Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5478381Z W1204 09:36:53.864000 41474 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5478771Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml
2025-12-04T10:11:57.5478868Z ============================= test session starts ==============================
2025-12-04T10:11:57.5479080Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5479150Z cachedir: .pytest_cache
2025-12-04T10:11:57.5479455Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5479536Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5479606Z configfile: pytest.ini
2025-12-04T10:11:57.5479960Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5480097Z collecting ... collected 58 items / 13 deselected / 45 selected
2025-12-04T10:11:57.5480186Z stepcurrent: skipping 13 already run items.
2025-12-04T10:11:57.5480259Z Running 45 items in this shard
2025-12-04T10:11:57.5480263Z 
2025-12-04T10:11:57.5480766Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0000s] [  2%]
2025-12-04T10:11:57.5481260Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6409s] [  2%]
2025-12-04T10:11:57.5481721Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6613s] [  2%]
2025-12-04T10:11:57.5481728Z 
2025-12-04T10:11:57.5481812Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5482114Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5482188Z Traceback (most recent call last):
2025-12-04T10:11:57.5482495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5482567Z     method(*args, **kwargs)
2025-12-04T10:11:57.5482865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5482933Z     method(*args, **kwargs)
2025-12-04T10:11:57.5483221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5483284Z     with policy():
2025-12-04T10:11:57.5483587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5483653Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5484470Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5484474Z 
2025-12-04T10:11:57.5484676Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5485203Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5485274Z 
2025-12-04T10:11:57.5485434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5485561Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5485660Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5486008Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5486134Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5486199Z graph_break []
2025-12-04T10:11:57.5486506Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5486586Z Traceback (most recent call last):
2025-12-04T10:11:57.5486886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5486952Z     method(*args, **kwargs)
2025-12-04T10:11:57.5487254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5487316Z     method(*args, **kwargs)
2025-12-04T10:11:57.5487603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5487665Z     with policy():
2025-12-04T10:11:57.5487959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5488028Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5488854Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5488861Z 
2025-12-04T10:11:57.5488988Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5489513Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5489517Z 
2025-12-04T10:11:57.5489676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5489805Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5489898Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5490252Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5490377Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5490437Z graph_break []
2025-12-04T10:11:57.5490565Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5490653Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5490777Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5491121Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5491178Z graph_break []
2025-12-04T10:11:57.5491268Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5491631Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5491705Z Traceback (most recent call last):
2025-12-04T10:11:57.5492020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5492166Z     method(*args, **kwargs)
2025-12-04T10:11:57.5492458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5492524Z     method(*args, **kwargs)
2025-12-04T10:11:57.5492814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5492881Z     with policy():
2025-12-04T10:11:57.5493177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5493242Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5494072Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5494078Z 
2025-12-04T10:11:57.5494202Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5494727Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5494731Z 
2025-12-04T10:11:57.5494885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5495007Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5495104Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5495453Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5495591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5495654Z graph_break []
2025-12-04T10:11:57.5495781Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5495873Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5495996Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5496340Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5496398Z graph_break []
2025-12-04T10:11:57.5496522Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5496618Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5496736Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5497071Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5497135Z graph_break []
2025-12-04T10:11:57.5497621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml -
2025-12-04T10:11:57.5497724Z =========================== short test summary info ============================
2025-12-04T10:11:57.5499096Z FAILED [0.6613s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5499102Z 
2025-12-04T10:11:57.5499296Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5499817Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5499821Z 
2025-12-04T10:11:57.5499973Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5500095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5500214Z ================== 1 failed, 13 deselected, 2 rerun in 3.33s ===================
2025-12-04T10:11:57.5500279Z Got exit code 1
2025-12-04T10:11:57.5500346Z Retrying single test...
2025-12-04T10:11:57.5500607Z W1204 09:37:03.664000 41663 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5500998Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml
2025-12-04T10:11:57.5501095Z ============================= test session starts ==============================
2025-12-04T10:11:57.5501305Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5501370Z cachedir: .pytest_cache
2025-12-04T10:11:57.5501673Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5501752Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5501818Z configfile: pytest.ini
2025-12-04T10:11:57.5502133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5502268Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5502842Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5502920Z Running 1 items in this shard
2025-12-04T10:11:57.5502923Z 
2025-12-04T10:11:57.5503659Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:04.890076577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5503663Z 
2025-12-04T10:11:57.5503973Z [W1204 09:37:14.036697559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5503976Z 
2025-12-04T10:11:57.5504270Z [W1204 09:37:14.036959223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5504274Z 
2025-12-04T10:11:57.5504565Z [W1204 09:37:14.042801364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5504572Z 
2025-12-04T10:11:57.5504861Z [W1204 09:37:14.043399834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5504865Z 
2025-12-04T10:11:57.5505154Z [W1204 09:37:14.043572227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5505157Z 
2025-12-04T10:11:57.5505450Z [W1204 09:37:14.049066391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5505523Z 
2025-12-04T10:11:57.5505816Z [W1204 09:37:14.049604530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5505819Z 
2025-12-04T10:11:57.5506112Z [W1204 09:37:14.049768083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5506180Z 
2025-12-04T10:11:57.5506261Z ('RERUN', {'yellow': True}) [11.1798s] [100%]
2025-12-04T10:11:57.5506991Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:15.402162565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5506996Z 
2025-12-04T10:11:57.5507286Z [W1204 09:37:15.402701894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5507292Z 
2025-12-04T10:11:57.5507585Z [W1204 09:37:15.402842867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5507588Z 
2025-12-04T10:11:57.5507878Z [W1204 09:37:15.405865299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5507884Z 
2025-12-04T10:11:57.5508173Z [W1204 09:37:15.406433339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5508180Z 
2025-12-04T10:11:57.5508466Z [W1204 09:37:15.406573911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5508469Z 
2025-12-04T10:11:57.5508756Z [W1204 09:37:15.411175620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5508759Z 
2025-12-04T10:11:57.5509052Z [W1204 09:37:15.411641638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5509055Z 
2025-12-04T10:11:57.5509341Z [W1204 09:37:15.411779321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5509347Z 
2025-12-04T10:11:57.5509431Z ('RERUN', {'yellow': True}) [0.5986s] [100%]
2025-12-04T10:11:57.5510156Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:16.999664076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5510160Z 
2025-12-04T10:11:57.5510450Z [W1204 09:37:16.000219666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5510456Z 
2025-12-04T10:11:57.5510740Z [W1204 09:37:16.000371108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5510744Z 
2025-12-04T10:11:57.5511032Z [W1204 09:37:16.003387180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5511042Z 
2025-12-04T10:11:57.5511333Z [W1204 09:37:16.003949710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5511336Z 
2025-12-04T10:11:57.5511622Z [W1204 09:37:16.004088112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5511625Z 
2025-12-04T10:11:57.5511920Z [W1204 09:37:16.008789742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5511923Z 
2025-12-04T10:11:57.5512279Z [W1204 09:37:16.009251990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5512283Z 
2025-12-04T10:11:57.5512577Z [W1204 09:37:16.009388732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5512645Z 
2025-12-04T10:11:57.5512707Z FAILED [0.5970s] [100%]
2025-12-04T10:11:57.5512710Z 
2025-12-04T10:11:57.5512799Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5513093Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5513169Z Traceback (most recent call last):
2025-12-04T10:11:57.5513480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5513548Z     method(*args, **kwargs)
2025-12-04T10:11:57.5513846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5513926Z     method(*args, **kwargs)
2025-12-04T10:11:57.5514220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5514294Z     with policy():
2025-12-04T10:11:57.5514591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5514657Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5515469Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5515473Z 
2025-12-04T10:11:57.5515601Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5516133Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5516139Z 
2025-12-04T10:11:57.5516297Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5516436Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5516531Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5516883Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5517188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5517253Z graph_break []
2025-12-04T10:11:57.5517382Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5518080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5518157Z   if out == self.unknown_value:
2025-12-04T10:11:57.5518455Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5518527Z Traceback (most recent call last):
2025-12-04T10:11:57.5518825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5518892Z     method(*args, **kwargs)
2025-12-04T10:11:57.5519182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5519380Z     method(*args, **kwargs)
2025-12-04T10:11:57.5519678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5519740Z     with policy():
2025-12-04T10:11:57.5520075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5520267Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5521097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5521106Z 
2025-12-04T10:11:57.5521233Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5521760Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5521764Z 
2025-12-04T10:11:57.5521925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5522056Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5522153Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5522498Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5522624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5522689Z graph_break []
2025-12-04T10:11:57.5522815Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5523513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5523586Z   if out == self.unknown_value:
2025-12-04T10:11:57.5523713Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5523816Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5523945Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5524286Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5524351Z graph_break []
2025-12-04T10:11:57.5524435Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5524741Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5524816Z Traceback (most recent call last):
2025-12-04T10:11:57.5525117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5525188Z     method(*args, **kwargs)
2025-12-04T10:11:57.5525482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5525553Z     method(*args, **kwargs)
2025-12-04T10:11:57.5525842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5525899Z     with policy():
2025-12-04T10:11:57.5526200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5526266Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5527166Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5527250Z 
2025-12-04T10:11:57.5527377Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5527906Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5527910Z 
2025-12-04T10:11:57.5528069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5528196Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5528291Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5528635Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5528758Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5528822Z graph_break []
2025-12-04T10:11:57.5528947Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5529630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5529700Z   if out == self.unknown_value:
2025-12-04T10:11:57.5529820Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5529915Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5530036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5530377Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5530438Z graph_break []
2025-12-04T10:11:57.5530561Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5530654Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5530774Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5531112Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5531174Z graph_break []
2025-12-04T10:11:57.5531673Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml -
2025-12-04T10:11:57.5531776Z =========================== short test summary info ============================
2025-12-04T10:11:57.5533085Z FAILED [0.5970s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5533092Z 
2025-12-04T10:11:57.5533221Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5533745Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5533749Z 
2025-12-04T10:11:57.5533974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5534083Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5534196Z ================== 1 failed, 57 deselected, 2 rerun in 12.40s ==================
2025-12-04T10:11:57.5534322Z Got exit code 1
2025-12-04T10:11:57.5534389Z Retrying single test...
2025-12-04T10:11:57.5534650Z W1204 09:37:22.590000 41857 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5535037Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml
2025-12-04T10:11:57.5535131Z ============================= test session starts ==============================
2025-12-04T10:11:57.5535337Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5535407Z cachedir: .pytest_cache
2025-12-04T10:11:57.5535715Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5535794Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5535859Z configfile: pytest.ini
2025-12-04T10:11:57.5536173Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5536304Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5536890Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5536963Z Running 1 items in this shard
2025-12-04T10:11:57.5536972Z 
2025-12-04T10:11:57.5537716Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:23.787505701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5537720Z 
2025-12-04T10:11:57.5538016Z [W1204 09:37:32.831611167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5538025Z 
2025-12-04T10:11:57.5538313Z [W1204 09:37:32.831855751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5538316Z 
2025-12-04T10:11:57.5538605Z [W1204 09:37:32.837421867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5538608Z 
2025-12-04T10:11:57.5538902Z [W1204 09:37:32.837958926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5538905Z 
2025-12-04T10:11:57.5539194Z [W1204 09:37:32.838126429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5539197Z 
2025-12-04T10:11:57.5539485Z [W1204 09:37:32.843467702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5539491Z 
2025-12-04T10:11:57.5539776Z [W1204 09:37:32.843972360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5539780Z 
2025-12-04T10:11:57.5540073Z [W1204 09:37:32.844130633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5540076Z 
2025-12-04T10:11:57.5540156Z ('RERUN', {'yellow': True}) [11.0508s] [100%]
2025-12-04T10:11:57.5540956Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:34.201330827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5540965Z 
2025-12-04T10:11:57.5541258Z [W1204 09:37:34.201855836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5541325Z 
2025-12-04T10:11:57.5541613Z [W1204 09:37:34.201995008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5541616Z 
2025-12-04T10:11:57.5541907Z [W1204 09:37:34.204899229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5541910Z 
2025-12-04T10:11:57.5542198Z [W1204 09:37:34.205450608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5542201Z 
2025-12-04T10:11:57.5542495Z [W1204 09:37:34.205587301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5542498Z 
2025-12-04T10:11:57.5542785Z [W1204 09:37:34.210070079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5542791Z 
2025-12-04T10:11:57.5543083Z [W1204 09:37:34.210542007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5543086Z 
2025-12-04T10:11:57.5543375Z [W1204 09:37:34.210681179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5543378Z 
2025-12-04T10:11:57.5543461Z ('RERUN', {'yellow': True}) [0.6000s] [100%]
2025-12-04T10:11:57.5544189Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:37:34.797474906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5544193Z 
2025-12-04T10:11:57.5544480Z [W1204 09:37:34.797998665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5544490Z 
2025-12-04T10:11:57.5544776Z [W1204 09:37:34.798150498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5544779Z 
2025-12-04T10:11:57.5545065Z [W1204 09:37:34.801073549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5545068Z 
2025-12-04T10:11:57.5545364Z [W1204 09:37:34.801618858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5545367Z 
2025-12-04T10:11:57.5545660Z [W1204 09:37:34.801757080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5545663Z 
2025-12-04T10:11:57.5545952Z [W1204 09:37:34.806222118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5545958Z 
2025-12-04T10:11:57.5546243Z [W1204 09:37:34.806675396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5546246Z 
2025-12-04T10:11:57.5546538Z [W1204 09:37:34.806813378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5546541Z 
2025-12-04T10:11:57.5546602Z FAILED [0.5964s] [100%]
2025-12-04T10:11:57.5546605Z 
2025-12-04T10:11:57.5546684Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5547080Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5547158Z Traceback (most recent call last):
2025-12-04T10:11:57.5547471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5547601Z     method(*args, **kwargs)
2025-12-04T10:11:57.5547893Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5547962Z     method(*args, **kwargs)
2025-12-04T10:11:57.5548256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5548318Z     with policy():
2025-12-04T10:11:57.5548615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5548680Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5549503Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5549509Z 
2025-12-04T10:11:57.5549636Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5550164Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5550167Z 
2025-12-04T10:11:57.5550323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5550452Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5550551Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5550904Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5551036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5551098Z graph_break []
2025-12-04T10:11:57.5551221Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5551917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5551989Z   if out == self.unknown_value:
2025-12-04T10:11:57.5552290Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5552363Z Traceback (most recent call last):
2025-12-04T10:11:57.5552660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5552727Z     method(*args, **kwargs)
2025-12-04T10:11:57.5553018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5553083Z     method(*args, **kwargs)
2025-12-04T10:11:57.5553376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5553434Z     with policy():
2025-12-04T10:11:57.5553730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5553794Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5554700Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5554705Z 
2025-12-04T10:11:57.5554837Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5555427Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5555431Z 
2025-12-04T10:11:57.5555591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5555716Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5555809Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5556158Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5556283Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5556347Z graph_break []
2025-12-04T10:11:57.5556472Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5557160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5557232Z   if out == self.unknown_value:
2025-12-04T10:11:57.5557354Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5557446Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5557567Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5557909Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5557974Z graph_break []
2025-12-04T10:11:57.5558057Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5558349Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5558427Z Traceback (most recent call last):
2025-12-04T10:11:57.5558720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5558787Z     method(*args, **kwargs)
2025-12-04T10:11:57.5559080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5559142Z     method(*args, **kwargs)
2025-12-04T10:11:57.5559440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5559501Z     with policy():
2025-12-04T10:11:57.5559796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5559860Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5560742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5560746Z 
2025-12-04T10:11:57.5560880Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5561401Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5561405Z 
2025-12-04T10:11:57.5561721Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5561848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5561941Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5562365Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5562490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5562553Z graph_break []
2025-12-04T10:11:57.5562678Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5563366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5563444Z   if out == self.unknown_value:
2025-12-04T10:11:57.5563566Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5563660Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5563784Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5564125Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5564186Z graph_break []
2025-12-04T10:11:57.5564310Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5564397Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5564522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5564861Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5564925Z graph_break []
2025-12-04T10:11:57.5565409Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml -
2025-12-04T10:11:57.5565513Z =========================== short test summary info ============================
2025-12-04T10:11:57.5566816Z FAILED [0.5964s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5566821Z 
2025-12-04T10:11:57.5566949Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5567476Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5567482Z 
2025-12-04T10:11:57.5567637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5567744Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5567858Z ================== 1 failed, 57 deselected, 2 rerun in 12.27s ==================
2025-12-04T10:11:57.5567918Z Got exit code 1
2025-12-04T10:11:57.5568403Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5568717Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5568985Z W1204 09:37:41.436000 42051 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5569370Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml
2025-12-04T10:11:57.5569529Z ============================= test session starts ==============================
2025-12-04T10:11:57.5569739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5569808Z cachedir: .pytest_cache
2025-12-04T10:11:57.5570117Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5570193Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5570257Z configfile: pytest.ini
2025-12-04T10:11:57.5570584Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5570714Z collecting ... collected 58 items / 14 deselected / 44 selected
2025-12-04T10:11:57.5570800Z stepcurrent: skipping 14 already run items.
2025-12-04T10:11:57.5570876Z Running 44 items in this shard
2025-12-04T10:11:57.5570883Z 
2025-12-04T10:11:57.5571378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8728s] [  2%]
2025-12-04T10:11:57.5571871Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4552s] [  2%]
2025-12-04T10:11:57.5572309Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4565s] [  2%]
2025-12-04T10:11:57.5572316Z 
2025-12-04T10:11:57.5572399Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5572693Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5572768Z Traceback (most recent call last):
2025-12-04T10:11:57.5573079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5573143Z     method(*args, **kwargs)
2025-12-04T10:11:57.5573436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5573503Z     method(*args, **kwargs)
2025-12-04T10:11:57.5573791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5573855Z     with policy():
2025-12-04T10:11:57.5574149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5574214Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5575018Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5575024Z 
2025-12-04T10:11:57.5575147Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5575672Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5575676Z 
2025-12-04T10:11:57.5575831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5576048Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5576145Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5576498Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5576692Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5576750Z graph_break []
2025-12-04T10:11:57.5577038Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5577113Z Traceback (most recent call last):
2025-12-04T10:11:57.5577409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5577478Z     method(*args, **kwargs)
2025-12-04T10:11:57.5577767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5577828Z     method(*args, **kwargs)
2025-12-04T10:11:57.5578132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5578197Z     with policy():
2025-12-04T10:11:57.5578491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5578565Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5579376Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5579381Z 
2025-12-04T10:11:57.5579510Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5580024Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5580031Z 
2025-12-04T10:11:57.5580189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5580313Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5580405Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5580753Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5580879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5580939Z graph_break []
2025-12-04T10:11:57.5581071Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5581159Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5581285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5581621Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5581682Z graph_break []
2025-12-04T10:11:57.5581768Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5582054Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5582130Z Traceback (most recent call last):
2025-12-04T10:11:57.5582422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5582484Z     method(*args, **kwargs)
2025-12-04T10:11:57.5582846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5582909Z     method(*args, **kwargs)
2025-12-04T10:11:57.5583197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5583322Z     with policy():
2025-12-04T10:11:57.5583613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5583681Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5584493Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5584496Z 
2025-12-04T10:11:57.5584621Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5585139Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5585144Z 
2025-12-04T10:11:57.5585297Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5585424Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5585513Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5585853Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5585984Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5586041Z graph_break []
2025-12-04T10:11:57.5586169Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5586257Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5586375Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5586715Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5586774Z graph_break []
2025-12-04T10:11:57.5586903Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5586996Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5587118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5587472Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5587531Z graph_break []
2025-12-04T10:11:57.5588022Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml -
2025-12-04T10:11:57.5588128Z =========================== short test summary info ============================
2025-12-04T10:11:57.5589414Z FAILED [0.4565s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5589418Z 
2025-12-04T10:11:57.5589545Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5590132Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5590136Z 
2025-12-04T10:11:57.5590293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5590461Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5590575Z ================== 1 failed, 14 deselected, 2 rerun in 2.81s ===================
2025-12-04T10:11:57.5590650Z Got exit code 1
2025-12-04T10:11:57.5590715Z Retrying single test...
2025-12-04T10:11:57.5590980Z W1204 09:37:51.099000 42239 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5591367Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml
2025-12-04T10:11:57.5591464Z ============================= test session starts ==============================
2025-12-04T10:11:57.5591678Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5591742Z cachedir: .pytest_cache
2025-12-04T10:11:57.5592045Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5592126Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5592190Z configfile: pytest.ini
2025-12-04T10:11:57.5592513Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5592638Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5593208Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5593285Z Running 1 items in this shard
2025-12-04T10:11:57.5593289Z 
2025-12-04T10:11:57.5594016Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:37:52.148646446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5594023Z 
2025-12-04T10:11:57.5594325Z [W1204 09:38:01.210659416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5594329Z 
2025-12-04T10:11:57.5594616Z [W1204 09:38:01.210921841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5594620Z 
2025-12-04T10:11:57.5594916Z [W1204 09:38:01.216778892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5594919Z 
2025-12-04T10:11:57.5595206Z [W1204 09:38:01.217344071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5595209Z 
2025-12-04T10:11:57.5595499Z [W1204 09:38:01.217509394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5595505Z 
2025-12-04T10:11:57.5595791Z [W1204 09:38:01.222933818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5595794Z 
2025-12-04T10:11:57.5596080Z [W1204 09:38:01.223483867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5596087Z 
2025-12-04T10:11:57.5596377Z [W1204 09:38:01.223655690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5596381Z 
2025-12-04T10:11:57.5596533Z ('RERUN', {'yellow': True}) [10.9176s] [100%]
2025-12-04T10:11:57.5597264Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:02.396496690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5597356Z 
2025-12-04T10:11:57.5597648Z [W1204 09:38:02.397023299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5597651Z 
2025-12-04T10:11:57.5597942Z [W1204 09:38:02.397168402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5597945Z 
2025-12-04T10:11:57.5598232Z [W1204 09:38:02.400024251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5598235Z 
2025-12-04T10:11:57.5598530Z [W1204 09:38:02.400583730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5598534Z 
2025-12-04T10:11:57.5598823Z [W1204 09:38:02.400723193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5598829Z 
2025-12-04T10:11:57.5599121Z [W1204 09:38:02.405121058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5599125Z 
2025-12-04T10:11:57.5599415Z [W1204 09:38:02.405577156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5599418Z 
2025-12-04T10:11:57.5599703Z [W1204 09:38:02.405716119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5599711Z 
2025-12-04T10:11:57.5599791Z ('RERUN', {'yellow': True}) [0.4168s] [100%]
2025-12-04T10:11:57.5600550Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:02.810225669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5600556Z 
2025-12-04T10:11:57.5600851Z [W1204 09:38:02.810750167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5600854Z 
2025-12-04T10:11:57.5601141Z [W1204 09:38:02.810890840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5601144Z 
2025-12-04T10:11:57.5601432Z [W1204 09:38:02.813718129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5601436Z 
2025-12-04T10:11:57.5601725Z [W1204 09:38:02.814261628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5601727Z 
2025-12-04T10:11:57.5602016Z [W1204 09:38:02.814402550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5602022Z 
2025-12-04T10:11:57.5602306Z [W1204 09:38:02.818760665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5602309Z 
2025-12-04T10:11:57.5602595Z [W1204 09:38:02.819206413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5602602Z 
2025-12-04T10:11:57.5602903Z [W1204 09:38:02.819343336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5602906Z 
2025-12-04T10:11:57.5602968Z FAILED [0.4117s] [100%]
2025-12-04T10:11:57.5602971Z 
2025-12-04T10:11:57.5603129Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5603426Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5603606Z Traceback (most recent call last):
2025-12-04T10:11:57.5603909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5603973Z     method(*args, **kwargs)
2025-12-04T10:11:57.5604269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5604332Z     method(*args, **kwargs)
2025-12-04T10:11:57.5604618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5604680Z     with policy():
2025-12-04T10:11:57.5604977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5605047Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5605844Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5605850Z 
2025-12-04T10:11:57.5605977Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5606498Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5606503Z 
2025-12-04T10:11:57.5606661Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5606798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5606895Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5607240Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5607374Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5607432Z graph_break []
2025-12-04T10:11:57.5607560Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5608252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5608321Z   if out == self.unknown_value:
2025-12-04T10:11:57.5608618Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5608692Z Traceback (most recent call last):
2025-12-04T10:11:57.5608993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5609062Z     method(*args, **kwargs)
2025-12-04T10:11:57.5609351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5609421Z     method(*args, **kwargs)
2025-12-04T10:11:57.5609715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5609778Z     with policy():
2025-12-04T10:11:57.5610074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5610139Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5611028Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5611097Z 
2025-12-04T10:11:57.5611223Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5611746Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5611750Z 
2025-12-04T10:11:57.5611903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5612028Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5612139Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5612486Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5612614Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5612675Z graph_break []
2025-12-04T10:11:57.5612798Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5613494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5613565Z   if out == self.unknown_value:
2025-12-04T10:11:57.5613688Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5613781Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5613903Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5614251Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5614314Z graph_break []
2025-12-04T10:11:57.5614398Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5614690Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5614764Z Traceback (most recent call last):
2025-12-04T10:11:57.5615063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5615126Z     method(*args, **kwargs)
2025-12-04T10:11:57.5615415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5615483Z     method(*args, **kwargs)
2025-12-04T10:11:57.5615776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5615844Z     with policy():
2025-12-04T10:11:57.5616151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5616221Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5617229Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5617234Z 
2025-12-04T10:11:57.5617367Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5618002Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5618007Z 
2025-12-04T10:11:57.5618170Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5618389Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5618493Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5618844Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5618979Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5619047Z graph_break []
2025-12-04T10:11:57.5619175Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5619870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5619940Z   if out == self.unknown_value:
2025-12-04T10:11:57.5620060Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5620159Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5620282Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5620628Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5620685Z graph_break []
2025-12-04T10:11:57.5620811Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5620904Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5621027Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5621369Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5621430Z graph_break []
2025-12-04T10:11:57.5621921Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml -
2025-12-04T10:11:57.5622023Z =========================== short test summary info ============================
2025-12-04T10:11:57.5623318Z FAILED [0.4117s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5623322Z 
2025-12-04T10:11:57.5623452Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5623971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5623977Z 
2025-12-04T10:11:57.5624139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5624242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5624355Z ================== 1 failed, 57 deselected, 2 rerun in 11.77s ==================
2025-12-04T10:11:57.5624418Z Got exit code 1
2025-12-04T10:11:57.5624484Z Retrying single test...
2025-12-04T10:11:57.5624820Z W1204 09:38:09.509000 42432 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5625207Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml
2025-12-04T10:11:57.5625301Z ============================= test session starts ==============================
2025-12-04T10:11:57.5625597Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5625663Z cachedir: .pytest_cache
2025-12-04T10:11:57.5625966Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5626047Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5626111Z configfile: pytest.ini
2025-12-04T10:11:57.5626427Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5626559Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5627123Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5627202Z Running 1 items in this shard
2025-12-04T10:11:57.5627206Z 
2025-12-04T10:11:57.5627935Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:10.562451910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5627939Z 
2025-12-04T10:11:57.5628251Z [W1204 09:38:19.833132329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5628255Z 
2025-12-04T10:11:57.5628557Z [W1204 09:38:19.833414604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5628561Z 
2025-12-04T10:11:57.5628854Z [W1204 09:38:19.839285875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
﻿2025-12-04T10:11:57.5631890Z 
2025-12-04T10:11:57.5632260Z [W1204 09:38:19.839880015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5632265Z 
2025-12-04T10:11:57.5632572Z [W1204 09:38:19.840092229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5632576Z 
2025-12-04T10:11:57.5632877Z [W1204 09:38:19.845637034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5632881Z 
2025-12-04T10:11:57.5633176Z [W1204 09:38:19.846195734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5633180Z 
2025-12-04T10:11:57.5633471Z [W1204 09:38:19.846377487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5633476Z 
2025-12-04T10:11:57.5633584Z ('RERUN', {'yellow': True}) [11.1268s] [100%]
2025-12-04T10:11:57.5634346Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:21.013635192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5634350Z 
2025-12-04T10:11:57.5634650Z [W1204 09:38:21.014157051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5634654Z 
2025-12-04T10:11:57.5635045Z [W1204 09:38:21.014301353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5635050Z 
2025-12-04T10:11:57.5635346Z [W1204 09:38:21.017179133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5635384Z 
2025-12-04T10:11:57.5635674Z [W1204 09:38:21.017725442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5635677Z 
2025-12-04T10:11:57.5635968Z [W1204 09:38:21.017862985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5635971Z 
2025-12-04T10:11:57.5636256Z [W1204 09:38:21.022275871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5636261Z 
2025-12-04T10:11:57.5636553Z [W1204 09:38:21.022728079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5636556Z 
2025-12-04T10:11:57.5636841Z [W1204 09:38:21.022865951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5636845Z 
2025-12-04T10:11:57.5636926Z ('RERUN', {'yellow': True}) [0.4151s] [100%]
2025-12-04T10:11:57.5637666Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:38:21.427366203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5637670Z 
2025-12-04T10:11:57.5637959Z [W1204 09:38:21.427878992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5637962Z 
2025-12-04T10:11:57.5638252Z [W1204 09:38:21.428022865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5638256Z 
2025-12-04T10:11:57.5638540Z [W1204 09:38:21.430906015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5638545Z 
2025-12-04T10:11:57.5638834Z [W1204 09:38:21.431450284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5638898Z 
2025-12-04T10:11:57.5639184Z [W1204 09:38:21.431592366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5639188Z 
2025-12-04T10:11:57.5639478Z [W1204 09:38:21.435995622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5639481Z 
2025-12-04T10:11:57.5639770Z [W1204 09:38:21.436456230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5639774Z 
2025-12-04T10:11:57.5640136Z [W1204 09:38:21.436600763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5640144Z 
2025-12-04T10:11:57.5640212Z FAILED [0.4117s] [100%]
2025-12-04T10:11:57.5640217Z 
2025-12-04T10:11:57.5640306Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5640614Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5640693Z Traceback (most recent call last):
2025-12-04T10:11:57.5641007Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5641080Z     method(*args, **kwargs)
2025-12-04T10:11:57.5641370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5641524Z     method(*args, **kwargs)
2025-12-04T10:11:57.5641826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5641888Z     with policy():
2025-12-04T10:11:57.5642224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5642293Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5643107Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5643112Z 
2025-12-04T10:11:57.5643247Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5643776Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5643785Z 
2025-12-04T10:11:57.5643948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5644085Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5644189Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5644541Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5644672Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5644738Z graph_break []
2025-12-04T10:11:57.5644865Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5645576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5645651Z   if out == self.unknown_value:
2025-12-04T10:11:57.5645953Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5646099Z Traceback (most recent call last):
2025-12-04T10:11:57.5646401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5646469Z     method(*args, **kwargs)
2025-12-04T10:11:57.5646756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5646819Z     method(*args, **kwargs)
2025-12-04T10:11:57.5647111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5647171Z     with policy():
2025-12-04T10:11:57.5647466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5647536Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5648352Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5648357Z 
2025-12-04T10:11:57.5648493Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5649028Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5649102Z 
2025-12-04T10:11:57.5649269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5649397Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5649491Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5649885Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5650018Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5650078Z graph_break []
2025-12-04T10:11:57.5650219Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5650927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5651002Z   if out == self.unknown_value:
2025-12-04T10:11:57.5651129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5651222Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5651350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5651693Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5651756Z graph_break []
2025-12-04T10:11:57.5651839Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5652133Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5652211Z Traceback (most recent call last):
2025-12-04T10:11:57.5652516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5652590Z     method(*args, **kwargs)
2025-12-04T10:11:57.5652885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5652995Z     method(*args, **kwargs)
2025-12-04T10:11:57.5653288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5653349Z     with policy():
2025-12-04T10:11:57.5653640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5653710Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5654532Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5654536Z 
2025-12-04T10:11:57.5654667Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5655190Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5655197Z 
2025-12-04T10:11:57.5655359Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5655482Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5655572Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5655920Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5656132Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5656195Z graph_break []
2025-12-04T10:11:57.5656321Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5657012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5657124Z   if out == self.unknown_value:
2025-12-04T10:11:57.5657246Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5657335Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5657462Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5657802Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5657865Z graph_break []
2025-12-04T10:11:57.5657997Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5658087Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5658215Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5658561Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5658618Z graph_break []
2025-12-04T10:11:57.5659116Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml -
2025-12-04T10:11:57.5659214Z =========================== short test summary info ============================
2025-12-04T10:11:57.5660514Z FAILED [0.4117s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5660561Z 
2025-12-04T10:11:57.5660687Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5661215Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5661221Z 
2025-12-04T10:11:57.5661501Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5661684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5661813Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ==================
2025-12-04T10:11:57.5661873Z Got exit code 1
2025-12-04T10:11:57.5662352Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5662598Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5662869Z W1204 09:38:28.076000 42625 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5663267Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml
2025-12-04T10:11:57.5663365Z ============================= test session starts ==============================
2025-12-04T10:11:57.5663665Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5663737Z cachedir: .pytest_cache
2025-12-04T10:11:57.5664046Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5664165Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5664233Z configfile: pytest.ini
2025-12-04T10:11:57.5664550Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5664684Z collecting ... collected 58 items / 15 deselected / 43 selected
2025-12-04T10:11:57.5664773Z stepcurrent: skipping 15 already run items.
2025-12-04T10:11:57.5664847Z Running 43 items in this shard
2025-12-04T10:11:57.5664850Z 
2025-12-04T10:11:57.5665348Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9250s] [  2%]
2025-12-04T10:11:57.5665832Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5379s] [  2%]
2025-12-04T10:11:57.5666280Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5288s] [  2%]
2025-12-04T10:11:57.5666287Z 
2025-12-04T10:11:57.5666369Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5666669Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5666744Z Traceback (most recent call last):
2025-12-04T10:11:57.5667055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5667124Z     method(*args, **kwargs)
2025-12-04T10:11:57.5667414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5667483Z     method(*args, **kwargs)
2025-12-04T10:11:57.5667766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5667876Z     with policy():
2025-12-04T10:11:57.5668179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5668243Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5669047Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5669054Z 
2025-12-04T10:11:57.5669191Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5669710Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5669721Z 
2025-12-04T10:11:57.5669877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5670005Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5670102Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5670651Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5670855Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5670916Z graph_break []
2025-12-04T10:11:57.5671206Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5671288Z Traceback (most recent call last):
2025-12-04T10:11:57.5671697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5671763Z     method(*args, **kwargs)
2025-12-04T10:11:57.5672053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5672115Z     method(*args, **kwargs)
2025-12-04T10:11:57.5672403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5672462Z     with policy():
2025-12-04T10:11:57.5672753Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5672822Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5673631Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5673638Z 
2025-12-04T10:11:57.5673768Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5674286Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5674290Z 
2025-12-04T10:11:57.5674446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5674590Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5674682Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5675227Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5675395Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5675458Z graph_break []
2025-12-04T10:11:57.5675591Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5675680Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5675798Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5676343Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5676402Z graph_break []
2025-12-04T10:11:57.5676489Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5676776Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5676852Z Traceback (most recent call last):
2025-12-04T10:11:57.5677148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5677211Z     method(*args, **kwargs)
2025-12-04T10:11:57.5677502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5677564Z     method(*args, **kwargs)
2025-12-04T10:11:57.5677850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5677982Z     with policy():
2025-12-04T10:11:57.5678272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5678338Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5679197Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5679203Z 
2025-12-04T10:11:57.5679326Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5679846Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5679850Z 
2025-12-04T10:11:57.5680111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5680245Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5680335Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5680878Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5681009Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5681069Z graph_break []
2025-12-04T10:11:57.5681198Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5681284Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5681401Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5681943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5682003Z graph_break []
2025-12-04T10:11:57.5682199Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5682286Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5682407Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5682946Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5683005Z graph_break []
2025-12-04T10:11:57.5683499Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml -
2025-12-04T10:11:57.5683603Z =========================== short test summary info ============================
2025-12-04T10:11:57.5684891Z FAILED [0.5288s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5684908Z 
2025-12-04T10:11:57.5685037Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5685621Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5685626Z 
2025-12-04T10:11:57.5685789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5685891Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5686047Z ================== 1 failed, 15 deselected, 2 rerun in 3.02s ===================
2025-12-04T10:11:57.5686107Z Got exit code 1
2025-12-04T10:11:57.5686172Z Retrying single test...
2025-12-04T10:11:57.5686440Z W1204 09:38:37.784000 42814 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5686825Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml
2025-12-04T10:11:57.5686918Z ============================= test session starts ==============================
2025-12-04T10:11:57.5687132Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5687198Z cachedir: .pytest_cache
2025-12-04T10:11:57.5687511Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5687589Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5687655Z configfile: pytest.ini
2025-12-04T10:11:57.5687987Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5688118Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5688685Z stepcurrent: skipping 15 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5688761Z Running 1 items in this shard
2025-12-04T10:11:57.5688765Z 
2025-12-04T10:11:57.5689497Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:39.369785747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5689542Z 
2025-12-04T10:11:57.5689844Z [W1204 09:38:48.626508113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5689848Z 
2025-12-04T10:11:57.5690135Z [W1204 09:38:48.626764167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5690138Z 
2025-12-04T10:11:57.5690429Z [W1204 09:38:48.633079837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5690433Z 
2025-12-04T10:11:57.5690720Z [W1204 09:38:48.633686177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5690723Z 
2025-12-04T10:11:57.5691015Z [W1204 09:38:48.633875531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5691020Z 
2025-12-04T10:11:57.5691306Z [W1204 09:38:48.639243253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5691309Z 
2025-12-04T10:11:57.5691599Z [W1204 09:38:48.639771942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5691602Z 
2025-12-04T10:11:57.5691888Z [W1204 09:38:48.639931785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5691891Z 
2025-12-04T10:11:57.5691972Z ('RERUN', {'yellow': True}) [11.1971s] [100%]
2025-12-04T10:11:57.5692762Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:49.441441919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5692799Z 
2025-12-04T10:11:57.5693085Z [W1204 09:38:49.441948728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5693089Z 
2025-12-04T10:11:57.5693378Z [W1204 09:38:49.442087940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5693382Z 
2025-12-04T10:11:57.5693666Z [W1204 09:38:49.444944450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5693670Z 
2025-12-04T10:11:57.5693963Z [W1204 09:38:49.445391577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5693966Z 
2025-12-04T10:11:57.5694251Z [W1204 09:38:49.445530340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5694255Z 
2025-12-04T10:11:57.5694549Z [W1204 09:38:49.449965366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5694553Z 
2025-12-04T10:11:57.5694845Z [W1204 09:38:49.450469055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5694848Z 
2025-12-04T10:11:57.5695134Z [W1204 09:38:49.450609747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5695140Z 
2025-12-04T10:11:57.5695219Z ('RERUN', {'yellow': True}) [0.5018s] [100%]
2025-12-04T10:11:57.5695948Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:50.940807752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5695954Z 
2025-12-04T10:11:57.5696290Z [W1204 09:38:50.941312871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5696293Z 
2025-12-04T10:11:57.5696582Z [W1204 09:38:50.941451783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5696585Z 
2025-12-04T10:11:57.5696876Z [W1204 09:38:50.944309633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5696879Z 
2025-12-04T10:11:57.5697164Z [W1204 09:38:50.944755281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5697167Z 
2025-12-04T10:11:57.5697456Z [W1204 09:38:50.944891053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5697459Z 
2025-12-04T10:11:57.5697744Z [W1204 09:38:50.949308240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5697749Z 
2025-12-04T10:11:57.5698044Z [W1204 09:38:50.949758057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5698048Z 
2025-12-04T10:11:57.5698333Z [W1204 09:38:50.949895410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5698336Z 
2025-12-04T10:11:57.5698399Z FAILED [0.4980s] [100%]
2025-12-04T10:11:57.5698402Z 
2025-12-04T10:11:57.5698557Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5698856Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5698935Z Traceback (most recent call last):
2025-12-04T10:11:57.5699271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5699338Z     method(*args, **kwargs)
2025-12-04T10:11:57.5699635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5699698Z     method(*args, **kwargs)
2025-12-04T10:11:57.5699986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5700049Z     with policy():
2025-12-04T10:11:57.5700355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5700427Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5701225Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5701234Z 
2025-12-04T10:11:57.5701364Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5701887Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5701891Z 
2025-12-04T10:11:57.5702048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5702183Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5702275Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5702822Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5702994Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5703052Z graph_break []
2025-12-04T10:11:57.5703183Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5703871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5703944Z   if out == self.unknown_value:
2025-12-04T10:11:57.5704237Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5704310Z Traceback (most recent call last):
2025-12-04T10:11:57.5704608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5704686Z     method(*args, **kwargs)
2025-12-04T10:11:57.5704975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5705042Z     method(*args, **kwargs)
2025-12-04T10:11:57.5705330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5705397Z     with policy():
2025-12-04T10:11:57.5705690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5705756Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5706659Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5706714Z 
2025-12-04T10:11:57.5706841Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5707366Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5707370Z 
2025-12-04T10:11:57.5707525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5707646Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5707740Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5708289Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5708475Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5708573Z graph_break []
2025-12-04T10:11:57.5708732Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5709483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5709586Z   if out == self.unknown_value:
2025-12-04T10:11:57.5709743Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5710054Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5710223Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5710797Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5710979Z graph_break []
2025-12-04T10:11:57.5711098Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5711514Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5711633Z Traceback (most recent call last):
2025-12-04T10:11:57.5711960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5712090Z     method(*args, **kwargs)
2025-12-04T10:11:57.5712485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5712598Z     method(*args, **kwargs)
2025-12-04T10:11:57.5712981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5713091Z     with policy():
2025-12-04T10:11:57.5713447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5713545Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5714392Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5714459Z 
2025-12-04T10:11:57.5714679Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5715281Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5715320Z 
2025-12-04T10:11:57.5715571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5715732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5715904Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5716478Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5716619Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5716809Z graph_break []
2025-12-04T10:11:57.5716963Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5717911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5718034Z   if out == self.unknown_value:
2025-12-04T10:11:57.5718191Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5718403Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5718575Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5719276Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5719370Z graph_break []
2025-12-04T10:11:57.5719523Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5719666Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5719901Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5720600Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5720747Z graph_break []
2025-12-04T10:11:57.5721267Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml -
2025-12-04T10:11:57.5721432Z =========================== short test summary info ============================
2025-12-04T10:11:57.5722746Z FAILED [0.4980s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5722753Z 
2025-12-04T10:11:57.5723040Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5723598Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5723602Z 
2025-12-04T10:11:57.5723825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5724073Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5724226Z ================== 1 failed, 57 deselected, 2 rerun in 12.22s ==================
2025-12-04T10:11:57.5724402Z Got exit code 1
2025-12-04T10:11:57.5724514Z Retrying single test...
2025-12-04T10:11:57.5724863Z W1204 09:38:56.591000 43008 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5725320Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml
2025-12-04T10:11:57.5725444Z ============================= test session starts ==============================
2025-12-04T10:11:57.5725720Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5725864Z cachedir: .pytest_cache
2025-12-04T10:11:57.5726296Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5726458Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5726558Z configfile: pytest.ini
2025-12-04T10:11:57.5726952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5727102Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5727744Z stepcurrent: skipping 15 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5727897Z Running 1 items in this shard
2025-12-04T10:11:57.5727902Z 
2025-12-04T10:11:57.5728664Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:38:58.202269031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5728669Z 
2025-12-04T10:11:57.5729033Z [W1204 09:39:07.121005188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5729038Z 
2025-12-04T10:11:57.5729423Z [W1204 09:39:07.121265812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5729428Z 
2025-12-04T10:11:57.5729810Z [W1204 09:39:07.127646362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5729814Z 
2025-12-04T10:11:57.5730144Z [W1204 09:39:07.128246022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5730148Z 
2025-12-04T10:11:57.5730503Z [W1204 09:39:07.128443366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5730506Z 
2025-12-04T10:11:57.5730821Z [W1204 09:39:07.133822619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5730824Z 
2025-12-04T10:11:57.5731158Z [W1204 09:39:07.134364359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5731197Z 
2025-12-04T10:11:57.5731500Z [W1204 09:39:07.134530842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5731503Z 
2025-12-04T10:11:57.5731670Z ('RERUN', {'yellow': True}) [10.8808s] [100%]
2025-12-04T10:11:57.5732552Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:39:07.932230053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5732556Z 
2025-12-04T10:11:57.5732894Z [W1204 09:39:07.932747933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5732931Z 
2025-12-04T10:11:57.5733285Z [W1204 09:39:07.932891785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5733291Z 
2025-12-04T10:11:57.5733682Z [W1204 09:39:07.935777025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5733686Z 
2025-12-04T10:11:57.5734077Z [W1204 09:39:07.936223743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5734081Z 
2025-12-04T10:11:57.5734426Z [W1204 09:39:07.936369385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5734430Z 
2025-12-04T10:11:57.5734803Z [W1204 09:39:08.940917844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5734806Z 
2025-12-04T10:11:57.5735125Z [W1204 09:39:08.941375732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5735130Z 
2025-12-04T10:11:57.5735479Z [W1204 09:39:08.941513325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5735482Z 
2025-12-04T10:11:57.5735578Z ('RERUN', {'yellow': True}) [0.4985s] [100%]
2025-12-04T10:11:57.5736384Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:39:08.430512485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5736435Z 
2025-12-04T10:11:57.5736768Z [W1204 09:39:08.431034654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5736772Z 
2025-12-04T10:11:57.5737090Z [W1204 09:39:08.431183296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5737145Z 
2025-12-04T10:11:57.5737502Z [W1204 09:39:08.434134587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5737505Z 
2025-12-04T10:11:57.5737823Z [W1204 09:39:08.434597965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5737827Z 
2025-12-04T10:11:57.5738214Z [W1204 09:39:08.434737968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5738221Z 
2025-12-04T10:11:57.5738572Z [W1204 09:39:08.439284336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5738576Z 
2025-12-04T10:11:57.5738922Z [W1204 09:39:08.439752894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5738929Z 
2025-12-04T10:11:57.5739248Z [W1204 09:39:08.439888916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5739252Z 
2025-12-04T10:11:57.5739378Z FAILED [0.4964s] [100%]
2025-12-04T10:11:57.5739382Z 
2025-12-04T10:11:57.5739482Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5739855Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5740102Z Traceback (most recent call last):
2025-12-04T10:11:57.5740447Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5740617Z     method(*args, **kwargs)
2025-12-04T10:11:57.5740971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5741105Z     method(*args, **kwargs)
2025-12-04T10:11:57.5741535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5741628Z     with policy():
2025-12-04T10:11:57.5741953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5742086Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5742937Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5742942Z 
2025-12-04T10:11:57.5743175Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5743757Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5743761Z 
2025-12-04T10:11:57.5743994Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5744158Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5744301Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5744901Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5745121Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5745257Z graph_break []
2025-12-04T10:11:57.5745418Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5746221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5746358Z   if out == self.unknown_value:
2025-12-04T10:11:57.5746666Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5746876Z Traceback (most recent call last):
2025-12-04T10:11:57.5747209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5747376Z     method(*args, **kwargs)
2025-12-04T10:11:57.5747742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5747837Z     method(*args, **kwargs)
2025-12-04T10:11:57.5748145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5748330Z     with policy():
2025-12-04T10:11:57.5748652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5748797Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5749720Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5749726Z 
2025-12-04T10:11:57.5749907Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5750504Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5750544Z 
2025-12-04T10:11:57.5750767Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5750959Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5751083Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5751691Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5751835Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5751970Z graph_break []
2025-12-04T10:11:57.5752188Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5752907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5753043Z   if out == self.unknown_value:
2025-12-04T10:11:57.5753194Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5753312Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5753648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5754219Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5754341Z graph_break []
2025-12-04T10:11:57.5754456Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5754827Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5755015Z Traceback (most recent call last):
2025-12-04T10:11:57.5755355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5755451Z     method(*args, **kwargs)
2025-12-04T10:11:57.5755806Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5755899Z     method(*args, **kwargs)
2025-12-04T10:11:57.5756266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5756405Z     with policy():
2025-12-04T10:11:57.5756742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5756877Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5757724Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5757728Z 
2025-12-04T10:11:57.5757942Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5758548Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5758552Z 
2025-12-04T10:11:57.5758843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5759000Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5759155Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5759777Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5760097Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5760263Z graph_break []
2025-12-04T10:11:57.5760435Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5761164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5761317Z   if out == self.unknown_value:
2025-12-04T10:11:57.5761469Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5761613Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5761816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5762413Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5762554Z graph_break []
2025-12-04T10:11:57.5762709Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5762828Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5763002Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5763626Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5763846Z graph_break []
2025-12-04T10:11:57.5764375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml -
2025-12-04T10:11:57.5764547Z =========================== short test summary info ============================
2025-12-04T10:11:57.5765875Z FAILED [0.4964s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5765882Z 
2025-12-04T10:11:57.5766134Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5766705Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5766709Z 
2025-12-04T10:11:57.5766912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5767081Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5767301Z ================== 1 failed, 57 deselected, 2 rerun in 11.90s ==================
2025-12-04T10:11:57.5767408Z Got exit code 1
2025-12-04T10:11:57.5768056Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5768372Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5768742Z W1204 09:39:15.093000 43202 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5769160Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml
2025-12-04T10:11:57.5769320Z ============================= test session starts ==============================
2025-12-04T10:11:57.5769544Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5769710Z cachedir: .pytest_cache
2025-12-04T10:11:57.5770101Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5770211Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5770307Z configfile: pytest.ini
2025-12-04T10:11:57.5770691Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5770856Z collecting ... collected 58 items / 16 deselected / 42 selected
2025-12-04T10:11:57.5771069Z stepcurrent: skipping 16 already run items.
2025-12-04T10:11:57.5771170Z Running 42 items in this shard
2025-12-04T10:11:57.5771175Z 
2025-12-04T10:11:57.5771706Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8864s] [  2%]
2025-12-04T10:11:57.5772264Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4925s] [  2%]
2025-12-04T10:11:57.5772754Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4802s] [  2%]
2025-12-04T10:11:57.5772798Z 
2025-12-04T10:11:57.5772983Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5773319Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5773540Z Traceback (most recent call last):
2025-12-04T10:11:57.5773877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5773990Z     method(*args, **kwargs)
2025-12-04T10:11:57.5774333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5774481Z     method(*args, **kwargs)
2025-12-04T10:11:57.5774845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5774936Z     with policy():
2025-12-04T10:11:57.5775259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5775419Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5776238Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5776243Z 
2025-12-04T10:11:57.5776495Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5777120Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5777124Z 
2025-12-04T10:11:57.5777328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5777568Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5777694Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5778144Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5778324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5778431Z graph_break []
2025-12-04T10:11:57.5778790Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5778897Z Traceback (most recent call last):
2025-12-04T10:11:57.5779249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5779393Z     method(*args, **kwargs)
2025-12-04T10:11:57.5779796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5779955Z     method(*args, **kwargs)
2025-12-04T10:11:57.5780273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5780363Z     with policy():
2025-12-04T10:11:57.5780705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5780876Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5781797Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5781802Z 
2025-12-04T10:11:57.5781960Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5782596Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5782600Z 
2025-12-04T10:11:57.5782786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5782933Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5783171Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5783549Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5783744Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5783832Z graph_break []
2025-12-04T10:11:57.5783987Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5784189Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5784354Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5784760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5784851Z graph_break []
2025-12-04T10:11:57.5784966Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5785321Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5785626Z Traceback (most recent call last):
2025-12-04T10:11:57.5785996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5786128Z     method(*args, **kwargs)
2025-12-04T10:11:57.5786493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5786659Z     method(*args, **kwargs)
2025-12-04T10:11:57.5786969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5787117Z     with policy():
2025-12-04T10:11:57.5787496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5787595Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5788511Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5788515Z 
2025-12-04T10:11:57.5788672Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5789211Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5789292Z 
2025-12-04T10:11:57.5789496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5789655Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5789825Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5790203Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5790357Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5790508Z graph_break []
2025-12-04T10:11:57.5790678Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5790890Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5791040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5791483Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5791601Z graph_break []
2025-12-04T10:11:57.5791803Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5791967Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5792140Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5792511Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5792637Z graph_break []
2025-12-04T10:11:57.5793154Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml -
2025-12-04T10:11:57.5793378Z =========================== short test summary info ============================
2025-12-04T10:11:57.5794809Z FAILED [0.4802s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5794814Z 
2025-12-04T10:11:57.5795007Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5795559Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5795598Z 
2025-12-04T10:11:57.5795784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5795987Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5796165Z ================== 1 failed, 16 deselected, 2 rerun in 2.88s ===================
2025-12-04T10:11:57.5796288Z Got exit code 1
2025-12-04T10:11:57.5796383Z Retrying single test...
2025-12-04T10:11:57.5796679Z W1204 09:39:24.745000 43390 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5797129Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml
2025-12-04T10:11:57.5797321Z ============================= test session starts ==============================
2025-12-04T10:11:57.5797581Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5797784Z cachedir: .pytest_cache
2025-12-04T10:11:57.5798120Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5798259Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5798345Z configfile: pytest.ini
2025-12-04T10:11:57.5798774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5798988Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5799592Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5799772Z Running 1 items in this shard
2025-12-04T10:11:57.5799776Z 
2025-12-04T10:11:57.5800610Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:25.832083876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5800615Z 
2025-12-04T10:11:57.5801020Z [W1204 09:39:34.861239927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5801025Z 
2025-12-04T10:11:57.5801371Z [W1204 09:39:34.861491871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5801375Z 
2025-12-04T10:11:57.5801697Z [W1204 09:39:34.867303240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5801737Z 
2025-12-04T10:11:57.5802071Z [W1204 09:39:34.867885430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5802075Z 
2025-12-04T10:11:57.5802396Z [W1204 09:39:34.868055873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5802400Z 
2025-12-04T10:11:57.5802754Z [W1204 09:39:34.873513866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5802758Z 
2025-12-04T10:11:57.5803193Z [W1204 09:39:34.874039615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5803197Z 
2025-12-04T10:11:57.5803569Z [W1204 09:39:34.874194328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5803606Z 
2025-12-04T10:11:57.5803718Z ('RERUN', {'yellow': True}) [10.9214s] [100%]
2025-12-04T10:11:57.5804531Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:36.083535800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5804535Z 
2025-12-04T10:11:57.5804856Z [W1204 09:39:36.084066819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5804859Z 
2025-12-04T10:11:57.5805247Z [W1204 09:39:36.084205092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5805251Z 
2025-12-04T10:11:57.5805659Z [W1204 09:39:36.087088121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5805665Z 
2025-12-04T10:11:57.5806033Z [W1204 09:39:36.087642560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5806037Z 
2025-12-04T10:11:57.5806359Z [W1204 09:39:36.087781593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5806362Z 
2025-12-04T10:11:57.5806680Z [W1204 09:39:36.092382871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5806717Z 
2025-12-04T10:11:57.5807025Z [W1204 09:39:36.092846719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5807028Z 
2025-12-04T10:11:57.5807390Z [W1204 09:39:36.092984652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5807394Z 
2025-12-04T10:11:57.5807559Z ('RERUN', {'yellow': True}) [0.4527s] [100%]
2025-12-04T10:11:57.5808369Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:36.535540620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5808373Z 
2025-12-04T10:11:57.5808728Z [W1204 09:39:36.536062009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5808732Z 
2025-12-04T10:11:57.5809054Z [W1204 09:39:36.536201771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5809058Z 
2025-12-04T10:11:57.5809445Z [W1204 09:39:36.539098391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5809449Z 
2025-12-04T10:11:57.5809784Z [W1204 09:39:36.539645970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5809789Z 
2025-12-04T10:11:57.5810161Z [W1204 09:39:36.539782842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5810165Z 
2025-12-04T10:11:57.5810481Z [W1204 09:39:36.544248438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5810484Z 
2025-12-04T10:11:57.5810872Z [W1204 09:39:36.544720936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5810911Z 
2025-12-04T10:11:57.5811218Z [W1204 09:39:36.544858859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5811221Z 
2025-12-04T10:11:57.5811438Z FAILED [0.4487s] [100%]
2025-12-04T10:11:57.5811444Z 
2025-12-04T10:11:57.5811608Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5811940Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5812079Z Traceback (most recent call last):
2025-12-04T10:11:57.5812503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5812588Z     method(*args, **kwargs)
2025-12-04T10:11:57.5813024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5813125Z     method(*args, **kwargs)
2025-12-04T10:11:57.5813445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5813571Z     with policy():
2025-12-04T10:11:57.5813896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5814075Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5814931Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5814935Z 
2025-12-04T10:11:57.5815129Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5815690Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5815694Z 
2025-12-04T10:11:57.5815883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5816122Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5816290Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5816716Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5816886Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5816977Z graph_break []
2025-12-04T10:11:57.5817356Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5818086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5818301Z   if out == self.unknown_value:
2025-12-04T10:11:57.5818636Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5818746Z Traceback (most recent call last):
2025-12-04T10:11:57.5819205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5819301Z     method(*args, **kwargs)
2025-12-04T10:11:57.5819689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5819805Z     method(*args, **kwargs)
2025-12-04T10:11:57.5820241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5820389Z     with policy():
2025-12-04T10:11:57.5820714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5820812Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5825762Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5825773Z 
2025-12-04T10:11:57.5825952Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5826504Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5826516Z 
2025-12-04T10:11:57.5826695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5826834Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5826937Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5827297Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5827428Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5827492Z graph_break []
2025-12-04T10:11:57.5827623Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5828335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5828412Z   if out == self.unknown_value:
2025-12-04T10:11:57.5828539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5828637Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5828872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5829219Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5829282Z graph_break []
2025-12-04T10:11:57.5829366Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5829669Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5829750Z Traceback (most recent call last):
2025-12-04T10:11:57.5830068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5830142Z     method(*args, **kwargs)
2025-12-04T10:11:57.5830436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5830503Z     method(*args, **kwargs)
2025-12-04T10:11:57.5830825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5830887Z     with policy():
2025-12-04T10:11:57.5831192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5831274Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5832189Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5832194Z 
2025-12-04T10:11:57.5832337Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5832864Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5832904Z 
2025-12-04T10:11:57.5833076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5833214Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5833314Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5833676Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5833809Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5833872Z graph_break []
2025-12-04T10:11:57.5833999Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5834707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5834793Z   if out == self.unknown_value:
2025-12-04T10:11:57.5834918Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5835009Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5835141Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5835488Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5835551Z graph_break []
2025-12-04T10:11:57.5835674Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5835764Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5835895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5836275Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5836350Z graph_break []
2025-12-04T10:11:57.5836853Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml -
2025-12-04T10:11:57.5836959Z =========================== short test summary info ============================
2025-12-04T10:11:57.5838275Z FAILED [0.4487s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5838282Z 
2025-12-04T10:11:57.5838411Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5838944Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5838948Z 
2025-12-04T10:11:57.5839106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5839305Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5839424Z ================== 1 failed, 57 deselected, 2 rerun in 11.85s ==================
2025-12-04T10:11:57.5839482Z Got exit code 1
2025-12-04T10:11:57.5839552Z Retrying single test...
2025-12-04T10:11:57.5839820Z W1204 09:39:43.207000 43583 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5840318Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml
2025-12-04T10:11:57.5840419Z ============================= test session starts ==============================
2025-12-04T10:11:57.5840631Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5840701Z cachedir: .pytest_cache
2025-12-04T10:11:57.5841013Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5841093Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5841172Z configfile: pytest.ini
2025-12-04T10:11:57.5841495Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5841634Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5842211Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5842280Z Running 1 items in this shard
2025-12-04T10:11:57.5842285Z 
2025-12-04T10:11:57.5843025Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:44.283836033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5843029Z 
2025-12-04T10:11:57.5843333Z [W1204 09:39:53.321639950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5843336Z 
2025-12-04T10:11:57.5843631Z [W1204 09:39:53.321875844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5843678Z 
2025-12-04T10:11:57.5843967Z [W1204 09:39:53.327348429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5843970Z 
2025-12-04T10:11:57.5844269Z [W1204 09:39:53.327857197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5844274Z 
2025-12-04T10:11:57.5844568Z [W1204 09:39:53.328017870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5844575Z 
2025-12-04T10:11:57.5844869Z [W1204 09:39:53.333251930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5844872Z 
2025-12-04T10:11:57.5845156Z [W1204 09:39:53.333757788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5845163Z 
2025-12-04T10:11:57.5845450Z [W1204 09:39:53.333912341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5845458Z 
2025-12-04T10:11:57.5845538Z ('RERUN', {'yellow': True}) [10.9202s] [100%]
2025-12-04T10:11:57.5846264Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:54.546756122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5846334Z 
2025-12-04T10:11:57.5846629Z [W1204 09:39:54.547304991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5846633Z 
2025-12-04T10:11:57.5846919Z [W1204 09:39:54.547449693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5846956Z 
2025-12-04T10:11:57.5847251Z [W1204 09:39:54.550500736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5847254Z 
2025-12-04T10:11:57.5847543Z [W1204 09:39:54.551079386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5847547Z 
2025-12-04T10:11:57.5847840Z [W1204 09:39:54.551217388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5847844Z 
2025-12-04T10:11:57.5848132Z [W1204 09:39:54.555826056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5848135Z 
2025-12-04T10:11:57.5848421Z [W1204 09:39:54.556293915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5848431Z 
2025-12-04T10:11:57.5848715Z [W1204 09:39:54.556444867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5848718Z 
2025-12-04T10:11:57.5848796Z ('RERUN', {'yellow': True}) [0.4542s] [100%]
2025-12-04T10:11:57.5849525Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:39:55.997789726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5849531Z 
2025-12-04T10:11:57.5849820Z [W1204 09:39:55.998331565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5849824Z 
2025-12-04T10:11:57.5850115Z [W1204 09:39:55.998477677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5850155Z 
2025-12-04T10:11:57.5850442Z [W1204 09:39:55.001517150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5850445Z 
2025-12-04T10:11:57.5850738Z [W1204 09:39:55.002087879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5850741Z 
2025-12-04T10:11:57.5851026Z [W1204 09:39:55.002229192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5851030Z 
2025-12-04T10:11:57.5851324Z [W1204 09:39:55.006832931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5851328Z 
2025-12-04T10:11:57.5851615Z [W1204 09:39:55.007295008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5851621Z 
2025-12-04T10:11:57.5851908Z [W1204 09:39:55.007433051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5851918Z 
2025-12-04T10:11:57.5851980Z FAILED [0.4493s] [100%]
2025-12-04T10:11:57.5851983Z 
2025-12-04T10:11:57.5852068Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5852374Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5852450Z Traceback (most recent call last):
2025-12-04T10:11:57.5853189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5853267Z     method(*args, **kwargs)
2025-12-04T10:11:57.5853563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5853666Z     method(*args, **kwargs)
2025-12-04T10:11:57.5853955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5854016Z     with policy():
2025-12-04T10:11:57.5854325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5854392Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5855211Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5855215Z 
2025-12-04T10:11:57.5855344Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5855873Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5855883Z 
2025-12-04T10:11:57.5856047Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5856176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5856279Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5856636Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5856767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5856832Z graph_break []
2025-12-04T10:11:57.5856955Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5857658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5857769Z   if out == self.unknown_value:
2025-12-04T10:11:57.5858063Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5858146Z Traceback (most recent call last):
2025-12-04T10:11:57.5858446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5858515Z     method(*args, **kwargs)
2025-12-04T10:11:57.5858812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5858876Z     method(*args, **kwargs)
2025-12-04T10:11:57.5859165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5859226Z     with policy():
2025-12-04T10:11:57.5859517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5859587Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5860408Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5860412Z 
2025-12-04T10:11:57.5860610Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5861135Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5861173Z 
2025-12-04T10:11:57.5861335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5861461Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5861555Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5861906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5862034Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5862093Z graph_break []
2025-12-04T10:11:57.5862229Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5862914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5862990Z   if out == self.unknown_value:
2025-12-04T10:11:57.5863113Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5863203Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5863330Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5863673Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5863734Z graph_break []
2025-12-04T10:11:57.5863819Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5864112Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.5864199Z Traceback (most recent call last):
2025-12-04T10:11:57.5864509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5864613Z     method(*args, **kwargs)
2025-12-04T10:11:57.5864907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5864969Z     method(*args, **kwargs)
2025-12-04T10:11:57.5865262Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5865320Z     with policy():
2025-12-04T10:11:57.5865616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5865689Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5866515Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5866522Z 
2025-12-04T10:11:57.5866657Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5867179Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5867183Z 
2025-12-04T10:11:57.5867337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5867552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5867644Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5867997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5868166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5868224Z graph_break []
2025-12-04T10:11:57.5868353Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5869043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5869116Z   if out == self.unknown_value:
2025-12-04T10:11:57.5869241Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5869332Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5869461Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5869798Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5869866Z graph_break []
2025-12-04T10:11:57.5869995Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5870083Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5870208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5870546Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5870604Z graph_break []
2025-12-04T10:11:57.5871106Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml -
2025-12-04T10:11:57.5871207Z =========================== short test summary info ============================
2025-12-04T10:11:57.5872515Z FAILED [0.4493s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5872559Z 
2025-12-04T10:11:57.5872687Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5873219Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5873222Z 
2025-12-04T10:11:57.5873379Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5873486Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5873611Z ================== 1 failed, 57 deselected, 2 rerun in 11.85s ==================
2025-12-04T10:11:57.5873669Z Got exit code 1
2025-12-04T10:11:57.5874150Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.5874394Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5874660Z W1204 09:40:01.651000 43776 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5875131Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml
2025-12-04T10:11:57.5875230Z ============================= test session starts ==============================
2025-12-04T10:11:57.5875483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5875549Z cachedir: .pytest_cache
2025-12-04T10:11:57.5875857Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5875937Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5876002Z configfile: pytest.ini
2025-12-04T10:11:57.5876316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5876449Z collecting ... collected 58 items / 17 deselected / 41 selected
2025-12-04T10:11:57.5876538Z stepcurrent: skipping 17 already run items.
2025-12-04T10:11:57.5876614Z Running 41 items in this shard
2025-12-04T10:11:57.5876618Z 
2025-12-04T10:11:57.5877117Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8498s] [  2%]
2025-12-04T10:11:57.5877606Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4611s] [  2%]
2025-12-04T10:11:57.5878052Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4613s] [  2%]
2025-12-04T10:11:57.5878056Z 
2025-12-04T10:11:57.5878142Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5878445Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5878522Z Traceback (most recent call last):
2025-12-04T10:11:57.5878830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5878940Z     method(*args, **kwargs)
2025-12-04T10:11:57.5879232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5879301Z     method(*args, **kwargs)
2025-12-04T10:11:57.5879589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5879647Z     with policy():
2025-12-04T10:11:57.5880000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5880069Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5880876Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5880884Z 
2025-12-04T10:11:57.5881011Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5881540Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5881549Z 
2025-12-04T10:11:57.5881706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5881832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5881939Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5882369Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5882502Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5882605Z graph_break []
2025-12-04T10:11:57.5882898Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5882979Z Traceback (most recent call last):
2025-12-04T10:11:57.5883278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5883343Z     method(*args, **kwargs)
2025-12-04T10:11:57.5883638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5883700Z     method(*args, **kwargs)
2025-12-04T10:11:57.5883994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5884057Z     with policy():
2025-12-04T10:11:57.5884346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5884418Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5885221Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5885225Z 
2025-12-04T10:11:57.5885356Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5885881Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5885885Z 
2025-12-04T10:11:57.5886053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5886186Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5886319Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5886670Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5886797Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5886855Z graph_break []
2025-12-04T10:11:57.5886985Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5887074Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5887193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5887543Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5887601Z graph_break []
2025-12-04T10:11:57.5887688Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5887981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5888055Z Traceback (most recent call last):
2025-12-04T10:11:57.5888355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5888418Z     method(*args, **kwargs)
2025-12-04T10:11:57.5888709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5888776Z     method(*args, **kwargs)
2025-12-04T10:11:57.5889135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5889200Z     with policy():
2025-12-04T10:11:57.5889489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5889590Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5890407Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5890411Z 
2025-12-04T10:11:57.5890534Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5891058Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5891061Z 
2025-12-04T10:11:57.5891213Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5891344Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5891449Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5891795Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5891923Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5891981Z graph_break []
2025-12-04T10:11:57.5892103Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5892195Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5892316Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5892659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5892719Z graph_break []
2025-12-04T10:11:57.5892841Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5893003Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5893121Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5893457Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5893522Z graph_break []
2025-12-04T10:11:57.5894010Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml -
2025-12-04T10:11:57.5894116Z =========================== short test summary info ============================
2025-12-04T10:11:57.5895388Z FAILED [0.4613s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5895396Z 
2025-12-04T10:11:57.5895523Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5896036Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5896040Z 
2025-12-04T10:11:57.5896264Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5896376Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5896490Z ================== 1 failed, 17 deselected, 2 rerun in 2.80s ===================
2025-12-04T10:11:57.5896587Z Got exit code 1
2025-12-04T10:11:57.5896651Z Retrying single test...
2025-12-04T10:11:57.5896913Z W1204 09:40:11.309000 43957 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5897305Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml
2025-12-04T10:11:57.5897401Z ============================= test session starts ==============================
2025-12-04T10:11:57.5897613Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5897680Z cachedir: .pytest_cache
2025-12-04T10:11:57.5897990Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5898075Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5898141Z configfile: pytest.ini
2025-12-04T10:11:57.5898458Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5898595Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5899163Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5899244Z Running 1 items in this shard
2025-12-04T10:11:57.5899248Z 
2025-12-04T10:11:57.5899982Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:12.361453107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5899986Z 
2025-12-04T10:11:57.5900292Z [W1204 09:40:21.532843020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5900410Z 
2025-12-04T10:11:57.5900703Z [W1204 09:40:21.533137674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5900707Z 
2025-12-04T10:11:57.5900999Z [W1204 09:40:21.538963785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5901007Z 
2025-12-04T10:11:57.5901298Z [W1204 09:40:21.539536594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5901301Z 
2025-12-04T10:11:57.5901592Z [W1204 09:40:21.539721196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5901595Z 
2025-12-04T10:11:57.5901889Z [W1204 09:40:21.545317344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5901894Z 
2025-12-04T10:11:57.5902183Z [W1204 09:40:21.545893903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5902187Z 
2025-12-04T10:11:57.5902482Z [W1204 09:40:21.546063626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5902486Z 
2025-12-04T10:11:57.5902568Z ('RERUN', {'yellow': True}) [11.0317s] [100%]
2025-12-04T10:11:57.5903362Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:22.719975453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5903366Z 
2025-12-04T10:11:57.5903657Z [W1204 09:40:22.720576483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5903696Z 
2025-12-04T10:11:57.5904002Z [W1204 09:40:22.720720386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5904005Z 
2025-12-04T10:11:57.5904296Z [W1204 09:40:22.723754688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5904300Z 
2025-12-04T10:11:57.5904587Z [W1204 09:40:22.724327598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5904595Z 
2025-12-04T10:11:57.5904885Z [W1204 09:40:22.724466050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5904888Z 
2025-12-04T10:11:57.5905176Z [W1204 09:40:22.729133051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5905183Z 
2025-12-04T10:11:57.5905475Z [W1204 09:40:22.729606539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5905478Z 
2025-12-04T10:11:57.5905766Z [W1204 09:40:22.729743221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5905769Z 
2025-12-04T10:11:57.5905854Z ('RERUN', {'yellow': True}) [0.4189s] [100%]
2025-12-04T10:11:57.5906574Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:23.137224358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5906578Z 
2025-12-04T10:11:57.5906875Z [W1204 09:40:23.137787498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5906923Z 
2025-12-04T10:11:57.5907216Z [W1204 09:40:23.137930160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5907219Z 
2025-12-04T10:11:57.5907505Z [W1204 09:40:23.141041024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5907514Z 
2025-12-04T10:11:57.5907800Z [W1204 09:40:23.141605194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5907803Z 
2025-12-04T10:11:57.5908094Z [W1204 09:40:23.141764537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5908097Z 
2025-12-04T10:11:57.5908389Z [W1204 09:40:23.146417557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5908394Z 
2025-12-04T10:11:57.5908682Z [W1204 09:40:23.146881765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5908685Z 
2025-12-04T10:11:57.5908976Z [W1204 09:40:23.147018257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5908979Z 
2025-12-04T10:11:57.5909039Z FAILED [0.4145s] [100%]
2025-12-04T10:11:57.5909043Z 
2025-12-04T10:11:57.5909130Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5909484Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5909562Z Traceback (most recent call last):
2025-12-04T10:11:57.5909875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5909975Z     method(*args, **kwargs)
2025-12-04T10:11:57.5910273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5910343Z     method(*args, **kwargs)
2025-12-04T10:11:57.5910633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5910696Z     with policy():
2025-12-04T10:11:57.5910989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5911053Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5911860Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5911867Z 
2025-12-04T10:11:57.5911998Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5912528Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5912532Z 
2025-12-04T10:11:57.5912689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5912819Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5912912Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5913264Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5913405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5913467Z graph_break []
2025-12-04T10:11:57.5913632Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5914329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5914401Z   if out == self.unknown_value:
2025-12-04T10:11:57.5914694Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5914767Z Traceback (most recent call last):
2025-12-04T10:11:57.5915075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5915139Z     method(*args, **kwargs)
2025-12-04T10:11:57.5915431Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5915503Z     method(*args, **kwargs)
2025-12-04T10:11:57.5915791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5915860Z     with policy():
2025-12-04T10:11:57.5916155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5916220Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5917329Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5917335Z 
2025-12-04T10:11:57.5917478Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5918021Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5918094Z 
2025-12-04T10:11:57.5918259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5918391Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5918484Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5918833Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5918968Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5919027Z graph_break []
2025-12-04T10:11:57.5919151Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5919859Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5919970Z   if out == self.unknown_value:
2025-12-04T10:11:57.5920097Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5920189Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5920313Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5920664Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5920727Z graph_break []
2025-12-04T10:11:57.5920814Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5921106Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5921237Z Traceback (most recent call last):
2025-12-04T10:11:57.5921549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5921612Z     method(*args, **kwargs)
2025-12-04T10:11:57.5921903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5921970Z     method(*args, **kwargs)
2025-12-04T10:11:57.5922269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5922347Z     with policy():
2025-12-04T10:11:57.5922644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5922709Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5923532Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5923540Z 
2025-12-04T10:11:57.5923665Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5924188Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5924192Z 
2025-12-04T10:11:57.5924419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5924545Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5924640Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5925017Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5925147Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5925205Z graph_break []
2025-12-04T10:11:57.5925328Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5926032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5926103Z   if out == self.unknown_value:
2025-12-04T10:11:57.5926231Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5926320Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5926443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5926796Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5926855Z graph_break []
2025-12-04T10:11:57.5926977Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5927069Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5927188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5927534Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5927593Z graph_break []
2025-12-04T10:11:57.5928082Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml -
2025-12-04T10:11:57.5928200Z =========================== short test summary info ============================
2025-12-04T10:11:57.5929522Z FAILED [0.4145s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5929526Z 
2025-12-04T10:11:57.5929659Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5930177Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5930182Z 
2025-12-04T10:11:57.5930340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5930447Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5930562Z ================== 1 failed, 57 deselected, 2 rerun in 11.89s ==================
2025-12-04T10:11:57.5930627Z Got exit code 1
2025-12-04T10:11:57.5930690Z Retrying single test...
2025-12-04T10:11:57.5930963Z W1204 09:40:29.793000 44143 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5931417Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml
2025-12-04T10:11:57.5931515Z ============================= test session starts ==============================
2025-12-04T10:11:57.5931726Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5931825Z cachedir: .pytest_cache
2025-12-04T10:11:57.5932133Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5932216Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5932281Z configfile: pytest.ini
2025-12-04T10:11:57.5932600Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5932728Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5933298Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5933380Z Running 1 items in this shard
2025-12-04T10:11:57.5933384Z 
2025-12-04T10:11:57.5934115Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:30.847224093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5934121Z 
2025-12-04T10:11:57.5934421Z [W1204 09:40:39.837374438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5934424Z 
2025-12-04T10:11:57.5934716Z [W1204 09:40:39.837622422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5934720Z 
2025-12-04T10:11:57.5935019Z [W1204 09:40:39.843460002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5935022Z 
2025-12-04T10:11:57.5935313Z [W1204 09:40:39.844035652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5935318Z 
2025-12-04T10:11:57.5935664Z [W1204 09:40:39.844218305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5935667Z 
2025-12-04T10:11:57.5935955Z [W1204 09:40:39.849620027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5935958Z 
2025-12-04T10:11:57.5936247Z [W1204 09:40:39.850187067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5936256Z 
2025-12-04T10:11:57.5936546Z [W1204 09:40:39.850368960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5936549Z 
2025-12-04T10:11:57.5936633Z ('RERUN', {'yellow': True}) [10.8515s] [100%]
2025-12-04T10:11:57.5937356Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:41.027024747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5937361Z 
2025-12-04T10:11:57.5937653Z [W1204 09:40:41.027609367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5937656Z 
2025-12-04T10:11:57.5937948Z [W1204 09:40:41.027754250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5937952Z 
2025-12-04T10:11:57.5938308Z [W1204 09:40:41.030698590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5938312Z 
2025-12-04T10:11:57.5938607Z [W1204 09:40:41.031264999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5938610Z 
2025-12-04T10:11:57.5938930Z [W1204 09:40:41.031402422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5938936Z 
2025-12-04T10:11:57.5939229Z [W1204 09:40:41.035876348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5939237Z 
2025-12-04T10:11:57.5939523Z [W1204 09:40:41.036344036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5939526Z 
2025-12-04T10:11:57.5939815Z [W1204 09:40:41.036480339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5939820Z 
2025-12-04T10:11:57.5939905Z ('RERUN', {'yellow': True}) [0.4156s] [100%]
2025-12-04T10:11:57.5940623Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:40:41.440301269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5940629Z 
2025-12-04T10:11:57.5940929Z [W1204 09:40:41.440874559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5940932Z 
2025-12-04T10:11:57.5941219Z [W1204 09:40:41.441018661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5941223Z 
2025-12-04T10:11:57.5941519Z [W1204 09:40:41.443919671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5941522Z 
2025-12-04T10:11:57.5941811Z [W1204 09:40:41.444483660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5941815Z 
2025-12-04T10:11:57.5942113Z [W1204 09:40:41.444621693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5942154Z 
2025-12-04T10:11:57.5942443Z [W1204 09:40:41.449076339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5942447Z 
2025-12-04T10:11:57.5942739Z [W1204 09:40:41.449526457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5942748Z 
2025-12-04T10:11:57.5943041Z [W1204 09:40:41.449662209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5943047Z 
2025-12-04T10:11:57.5943113Z FAILED [0.4107s] [100%]
2025-12-04T10:11:57.5943116Z 
2025-12-04T10:11:57.5943205Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5943501Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5943584Z Traceback (most recent call last):
2025-12-04T10:11:57.5943894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5943959Z     method(*args, **kwargs)
2025-12-04T10:11:57.5944255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5944318Z     method(*args, **kwargs)
2025-12-04T10:11:57.5944610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5944768Z     with policy():
2025-12-04T10:11:57.5945067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5945137Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5945936Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5945976Z 
2025-12-04T10:11:57.5946113Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5946643Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5946647Z 
2025-12-04T10:11:57.5946807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5946940Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5947034Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5947385Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5947517Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5947576Z graph_break []
2025-12-04T10:11:57.5947704Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5948396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5948468Z   if out == self.unknown_value:
2025-12-04T10:11:57.5948763Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5948836Z Traceback (most recent call last):
2025-12-04T10:11:57.5949141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5949244Z     method(*args, **kwargs)
2025-12-04T10:11:57.5949536Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5949603Z     method(*args, **kwargs)
2025-12-04T10:11:57.5949892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5949951Z     with policy():
2025-12-04T10:11:57.5950248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5950315Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5951128Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5951135Z 
2025-12-04T10:11:57.5951260Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5951784Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5951787Z 
2025-12-04T10:11:57.5951942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5952067Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5952235Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5952581Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5952745Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5952806Z graph_break []
2025-12-04T10:11:57.5952928Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5953625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5953696Z   if out == self.unknown_value:
2025-12-04T10:11:57.5953818Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5953917Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5954044Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5954401Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5954465Z graph_break []
2025-12-04T10:11:57.5954549Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5954843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.5954915Z Traceback (most recent call last):
2025-12-04T10:11:57.5955217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5955281Z     method(*args, **kwargs)
2025-12-04T10:11:57.5955577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5955646Z     method(*args, **kwargs)
2025-12-04T10:11:57.5955933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5955995Z     with policy():
2025-12-04T10:11:57.5956336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5956402Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5957214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5957218Z 
2025-12-04T10:11:57.5957343Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5957862Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5957871Z 
2025-12-04T10:11:57.5958034Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5958159Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5958265Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5958612Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5958737Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5958803Z graph_break []
2025-12-04T10:11:57.5958927Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.5959693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.5959794Z   if out == self.unknown_value:
2025-12-04T10:11:57.5959970Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5960073Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5960196Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5960544Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5960603Z graph_break []
2025-12-04T10:11:57.5960727Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5960823Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5960945Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5961284Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5961350Z graph_break []
2025-12-04T10:11:57.5961836Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml -
2025-12-04T10:11:57.5961943Z =========================== short test summary info ============================
2025-12-04T10:11:57.5963228Z FAILED [0.4107s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5963232Z 
2025-12-04T10:11:57.5963362Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5963920Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5963924Z 
2025-12-04T10:11:57.5964082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5964186Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5964299Z ================== 1 failed, 57 deselected, 2 rerun in 11.70s ==================
2025-12-04T10:11:57.5964362Z Got exit code 1
2025-12-04T10:11:57.5964838Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.5965081Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.5965351Z W1204 09:40:48.085000 44329 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5965741Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml
2025-12-04T10:11:57.5965839Z ============================= test session starts ==============================
2025-12-04T10:11:57.5966043Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5966108Z cachedir: .pytest_cache
2025-12-04T10:11:57.5966498Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5966577Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5966650Z configfile: pytest.ini
2025-12-04T10:11:57.5966967Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5967130Z collecting ... collected 58 items / 18 deselected / 40 selected
2025-12-04T10:11:57.5967222Z stepcurrent: skipping 18 already run items.
2025-12-04T10:11:57.5967293Z Running 40 items in this shard
2025-12-04T10:11:57.5967296Z 
2025-12-04T10:11:57.5967792Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9077s] [  2%]
2025-12-04T10:11:57.5968296Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4514s] [  2%]
2025-12-04T10:11:57.5968736Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4490s] [  2%]
2025-12-04T10:11:57.5968740Z 
2025-12-04T10:11:57.5968837Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.5969129Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5969208Z Traceback (most recent call last):
2025-12-04T10:11:57.5969514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5969577Z     method(*args, **kwargs)
2025-12-04T10:11:57.5969873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5969936Z     method(*args, **kwargs)
2025-12-04T10:11:57.5970224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5970291Z     with policy():
2025-12-04T10:11:57.5970761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5970920Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5971724Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.5971728Z 
2025-12-04T10:11:57.5971856Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5972384Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5972388Z 
2025-12-04T10:11:57.5972548Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5972681Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5972778Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5973129Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5973255Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5973314Z graph_break []
2025-12-04T10:11:57.5973608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5973681Z Traceback (most recent call last):
2025-12-04T10:11:57.5974072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5974149Z     method(*args, **kwargs)
2025-12-04T10:11:57.5974444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5974553Z     method(*args, **kwargs)
2025-12-04T10:11:57.5974841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5974902Z     with policy():
2025-12-04T10:11:57.5975199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5975264Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5976071Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.5976081Z 
2025-12-04T10:11:57.5976205Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5976723Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5976728Z 
2025-12-04T10:11:57.5976888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5977018Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5977113Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5977457Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5977584Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5977650Z graph_break []
2025-12-04T10:11:57.5977772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5977869Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5978036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5978377Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5978440Z graph_break []
2025-12-04T10:11:57.5978522Z =================================== FAILURES ===================================
2025-12-04T10:11:57.5978809Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.5978892Z Traceback (most recent call last):
2025-12-04T10:11:57.5979193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5979262Z     method(*args, **kwargs)
2025-12-04T10:11:57.5979549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.5979616Z     method(*args, **kwargs)
2025-12-04T10:11:57.5979906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.5979965Z     with policy():
2025-12-04T10:11:57.5980260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.5980332Z     raise RuntimeError(msg)
2025-12-04T10:11:57.5981204Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5981208Z 
2025-12-04T10:11:57.5981338Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5981890Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5981894Z 
2025-12-04T10:11:57.5982055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5982179Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5982267Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5982626Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5982757Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5982818Z graph_break []
2025-12-04T10:11:57.5982948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5983039Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5983166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5983507Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5983570Z graph_break []
2025-12-04T10:11:57.5983697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.5983786Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.5983910Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.5984261Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.5984320Z graph_break []
2025-12-04T10:11:57.5984813Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml -
2025-12-04T10:11:57.5984951Z =========================== short test summary info ============================
2025-12-04T10:11:57.5986238Z FAILED [0.4490s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.5986242Z 
2025-12-04T10:11:57.5986365Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.5986878Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5986888Z 
2025-12-04T10:11:57.5987041Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.5987144Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.5987264Z ================== 1 failed, 18 deselected, 2 rerun in 2.83s ===================
2025-12-04T10:11:57.5987323Z Got exit code 1
2025-12-04T10:11:57.5987386Z Retrying single test...
2025-12-04T10:11:57.5987663Z W1204 09:40:57.744000 44510 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.5988121Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml
2025-12-04T10:11:57.5988229Z ============================= test session starts ==============================
2025-12-04T10:11:57.5988443Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.5988544Z cachedir: .pytest_cache
2025-12-04T10:11:57.5988858Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.5988933Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.5988997Z configfile: pytest.ini
2025-12-04T10:11:57.5989316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.5989445Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.5990024Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.5990096Z Running 1 items in this shard
2025-12-04T10:11:57.5990100Z 
2025-12-04T10:11:57.5990835Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:40:59.021696814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5990841Z 
2025-12-04T10:11:57.5991138Z [W1204 09:41:08.316248202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5991142Z 
2025-12-04T10:11:57.5991435Z [W1204 09:41:08.316516437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5991446Z 
2025-12-04T10:11:57.5991734Z [W1204 09:41:08.322455048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5991737Z 
2025-12-04T10:11:57.5992025Z [W1204 09:41:08.323064639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5992065Z 
2025-12-04T10:11:57.5992356Z [W1204 09:41:08.323240092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5992359Z 
2025-12-04T10:11:57.5992647Z [W1204 09:41:08.328750066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5992649Z 
2025-12-04T10:11:57.5992939Z [W1204 09:41:08.329296635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5992943Z 
2025-12-04T10:11:57.5993231Z [W1204 09:41:08.329485538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5993235Z 
2025-12-04T10:11:57.5993323Z ('RERUN', {'yellow': True}) [11.2000s] [100%]
2025-12-04T10:11:57.5994046Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:09.326363869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5994051Z 
2025-12-04T10:11:57.5994342Z [W1204 09:41:09.326925339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5994345Z 
2025-12-04T10:11:57.5994634Z [W1204 09:41:09.327066121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5994638Z 
2025-12-04T10:11:57.5994991Z [W1204 09:41:09.329945051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5994995Z 
2025-12-04T10:11:57.5995289Z [W1204 09:41:09.330521500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5995331Z 
2025-12-04T10:11:57.5995617Z [W1204 09:41:09.330664423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5995620Z 
2025-12-04T10:11:57.5995913Z [W1204 09:41:09.335123039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5995916Z 
2025-12-04T10:11:57.5996203Z [W1204 09:41:09.335576517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5996207Z 
2025-12-04T10:11:57.5996499Z [W1204 09:41:09.335714099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5996502Z 
2025-12-04T10:11:57.5996580Z ('RERUN', {'yellow': True}) [0.4173s] [100%]
2025-12-04T10:11:57.5997306Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:09.740855041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5997313Z 
2025-12-04T10:11:57.5997600Z [W1204 09:41:09.741404211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5997603Z 
2025-12-04T10:11:57.5997901Z [W1204 09:41:09.741548003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5997908Z 
2025-12-04T10:11:57.5998197Z [W1204 09:41:09.744401562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5998200Z 
2025-12-04T10:11:57.5998488Z [W1204 09:41:09.744941411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5998548Z 
2025-12-04T10:11:57.5998842Z [W1204 09:41:09.745080473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5998845Z 
2025-12-04T10:11:57.5999133Z [W1204 09:41:09.749483278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5999136Z 
2025-12-04T10:11:57.5999429Z [W1204 09:41:09.749935466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5999432Z 
2025-12-04T10:11:57.5999722Z [W1204 09:41:09.750092239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.5999725Z 
2025-12-04T10:11:57.5999793Z FAILED [0.4143s] [100%]
2025-12-04T10:11:57.5999796Z 
2025-12-04T10:11:57.5999959Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6000266Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6000346Z Traceback (most recent call last):
2025-12-04T10:11:57.6000654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6000724Z     method(*args, **kwargs)
2025-12-04T10:11:57.6001021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6001085Z     method(*args, **kwargs)
2025-12-04T10:11:57.6001454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6001516Z     with policy():
2025-12-04T10:11:57.6001809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6001918Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6002711Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6002715Z 
2025-12-04T10:11:57.6002848Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6003371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6003376Z 
2025-12-04T10:11:57.6003537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6003666Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6003762Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6004117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6004246Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6004321Z graph_break []
2025-12-04T10:11:57.6004449Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6005143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6005216Z   if out == self.unknown_value:
2025-12-04T10:11:57.6005506Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6005627Z Traceback (most recent call last):
2025-12-04T10:11:57.6005940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6006003Z     method(*args, **kwargs)
2025-12-04T10:11:57.6006295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6006356Z     method(*args, **kwargs)
2025-12-04T10:11:57.6006644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6006708Z     with policy():
2025-12-04T10:11:57.6007001Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6007066Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6007879Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6007886Z 
2025-12-04T10:11:57.6008010Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6008535Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6008538Z 
2025-12-04T10:11:57.6008758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6008889Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6008981Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6009325Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6009571Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6009629Z graph_break []
2025-12-04T10:11:57.6009753Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6010445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6010513Z   if out == self.unknown_value:
2025-12-04T10:11:57.6010638Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6010730Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6010851Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6011209Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6011271Z graph_break []
2025-12-04T10:11:57.6011360Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6011645Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6011717Z Traceback (most recent call last):
2025-12-04T10:11:57.6012018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6012083Z     method(*args, **kwargs)
2025-12-04T10:11:57.6012379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6012441Z     method(*args, **kwargs)
2025-12-04T10:11:57.6012725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6012827Z     with policy():
2025-12-04T10:11:57.6013120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6013185Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6014000Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6014006Z 
2025-12-04T10:11:57.6014129Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6014653Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6014659Z 
2025-12-04T10:11:57.6014813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6014939Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6015030Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6015371Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6015496Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6015556Z graph_break []
2025-12-04T10:11:57.6015742Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6016433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6016535Z   if out == self.unknown_value:
2025-12-04T10:11:57.6016659Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6016747Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6016869Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6017404Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6017464Z graph_break []
2025-12-04T10:11:57.6017592Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6017678Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6017799Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6018151Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6018211Z graph_break []
2025-12-04T10:11:57.6018703Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml -
2025-12-04T10:11:57.6018799Z =========================== short test summary info ============================
2025-12-04T10:11:57.6020083Z FAILED [0.4143s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6020162Z 
2025-12-04T10:11:57.6020290Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6020803Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6020807Z 
2025-12-04T10:11:57.6020964Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6021067Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6021199Z ================== 1 failed, 57 deselected, 2 rerun in 12.05s ==================
2025-12-04T10:11:57.6021261Z Got exit code 1
2025-12-04T10:11:57.6021331Z Retrying single test...
2025-12-04T10:11:57.6021605Z W1204 09:41:16.376000 44696 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6021999Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml
2025-12-04T10:11:57.6022098Z ============================= test session starts ==============================
2025-12-04T10:11:57.6022309Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6022373Z cachedir: .pytest_cache
2025-12-04T10:11:57.6022681Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6022756Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6022822Z configfile: pytest.ini
2025-12-04T10:11:57.6023237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6023369Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6023945Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6024084Z Running 1 items in this shard
2025-12-04T10:11:57.6024088Z 
2025-12-04T10:11:57.6024813Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:17.646103076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6024818Z 
2025-12-04T10:11:57.6025134Z [W1204 09:41:26.789116142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6025138Z 
2025-12-04T10:11:57.6025426Z [W1204 09:41:26.789372466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6025431Z 
2025-12-04T10:11:57.6025726Z [W1204 09:41:26.795166008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6025730Z 
2025-12-04T10:11:57.6026015Z [W1204 09:41:26.795757888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6026018Z 
2025-12-04T10:11:57.6026308Z [W1204 09:41:26.795930621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6026312Z 
2025-12-04T10:11:57.6026601Z [W1204 09:41:26.801429498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6026604Z 
2025-12-04T10:11:57.6026896Z [W1204 09:41:26.801977777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6026900Z 
2025-12-04T10:11:57.6027190Z [W1204 09:41:26.802167811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6027231Z 
2025-12-04T10:11:57.6027313Z ('RERUN', {'yellow': True}) [11.0362s] [100%]
2025-12-04T10:11:57.6028044Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:27.792403230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6028048Z 
2025-12-04T10:11:57.6028337Z [W1204 09:41:27.792964500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6028340Z 
2025-12-04T10:11:57.6028631Z [W1204 09:41:27.793105092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6028636Z 
2025-12-04T10:11:57.6028923Z [W1204 09:41:27.796037163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6028925Z 
2025-12-04T10:11:57.6029215Z [W1204 09:41:27.796603803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6029218Z 
2025-12-04T10:11:57.6029508Z [W1204 09:41:27.796742596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6029511Z 
2025-12-04T10:11:57.6029870Z [W1204 09:41:27.801273544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6029874Z 
2025-12-04T10:11:57.6030163Z [W1204 09:41:27.801735372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6030166Z 
2025-12-04T10:11:57.6030494Z [W1204 09:41:27.801872205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6030499Z 
2025-12-04T10:11:57.6030577Z ('RERUN', {'yellow': True}) [0.4147s] [100%]
2025-12-04T10:11:57.6031292Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:41:28.205184171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6031301Z 
2025-12-04T10:11:57.6031593Z [W1204 09:41:28.205745750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6031596Z 
2025-12-04T10:11:57.6031881Z [W1204 09:41:28.205887213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6031884Z 
2025-12-04T10:11:57.6032173Z [W1204 09:41:28.208811654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6032178Z 
2025-12-04T10:11:57.6032463Z [W1204 09:41:28.209359463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6032466Z 
2025-12-04T10:11:57.6032758Z [W1204 09:41:28.209499556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6032762Z 
2025-12-04T10:11:57.6033054Z [W1204 09:41:28.213936193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6033057Z 
2025-12-04T10:11:57.6033344Z [W1204 09:41:28.214386031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6033347Z 
2025-12-04T10:11:57.6033633Z [W1204 09:41:28.214522374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6033673Z 
2025-12-04T10:11:57.6033734Z FAILED [0.4118s] [100%]
2025-12-04T10:11:57.6033742Z 
2025-12-04T10:11:57.6033823Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6034115Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6034192Z Traceback (most recent call last):
2025-12-04T10:11:57.6034498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6034564Z     method(*args, **kwargs)
2025-12-04T10:11:57.6034860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6034924Z     method(*args, **kwargs)
2025-12-04T10:11:57.6035217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6035277Z     with policy():
2025-12-04T10:11:57.6035568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6035637Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6036429Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6036504Z 
2025-12-04T10:11:57.6036645Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6037173Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6037211Z 
2025-12-04T10:11:57.6037367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6037501Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6037593Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6037954Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6038080Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6038138Z graph_break []
2025-12-04T10:11:57.6038269Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6038964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6039041Z   if out == self.unknown_value:
2025-12-04T10:11:57.6039331Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6039407Z Traceback (most recent call last):
2025-12-04T10:11:57.6039719Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6039781Z     method(*args, **kwargs)
2025-12-04T10:11:57.6040122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6040193Z     method(*args, **kwargs)
2025-12-04T10:11:57.6040481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6040543Z     with policy():
2025-12-04T10:11:57.6040838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6040943Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6041753Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6041757Z 
2025-12-04T10:11:57.6041883Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6042405Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6042409Z 
2025-12-04T10:11:57.6042563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6042691Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6042788Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6043148Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6043282Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6043339Z graph_break []
2025-12-04T10:11:57.6043460Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6044220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6044289Z   if out == self.unknown_value:
2025-12-04T10:11:57.6044454Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6044546Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6044669Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6045014Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6045072Z graph_break []
2025-12-04T10:11:57.6045155Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6045454Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6045527Z Traceback (most recent call last):
2025-12-04T10:11:57.6045826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6045902Z     method(*args, **kwargs)
2025-12-04T10:11:57.6046196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6046264Z     method(*args, **kwargs)
2025-12-04T10:11:57.6046550Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6046614Z     with policy():
2025-12-04T10:11:57.6046904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6046970Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6047783Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6047788Z 
2025-12-04T10:11:57.6047953Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6048471Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6048475Z 
2025-12-04T10:11:57.6048628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6048751Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6048845Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6049190Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6049316Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6049376Z graph_break []
2025-12-04T10:11:57.6049503Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6050194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6050262Z   if out == self.unknown_value:
2025-12-04T10:11:57.6050387Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6050476Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6050597Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6051031Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6051102Z graph_break []
2025-12-04T10:11:57.6051258Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6051352Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6051473Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6051820Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6051880Z graph_break []
2025-12-04T10:11:57.6052369Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml -
2025-12-04T10:11:57.6052476Z =========================== short test summary info ============================
2025-12-04T10:11:57.6053759Z FAILED [0.4118s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6053766Z 
2025-12-04T10:11:57.6053893Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6054412Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6054416Z 
2025-12-04T10:11:57.6054573Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6054676Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6054791Z ================== 1 failed, 57 deselected, 2 rerun in 11.89s ==================
2025-12-04T10:11:57.6054895Z Got exit code 1
2025-12-04T10:11:57.6055366Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6055614Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6055877Z W1204 09:41:34.834000 44882 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6056264Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml
2025-12-04T10:11:57.6056366Z ============================= test session starts ==============================
2025-12-04T10:11:57.6056574Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6056643Z cachedir: .pytest_cache
2025-12-04T10:11:57.6056952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6057028Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6057095Z configfile: pytest.ini
2025-12-04T10:11:57.6057406Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6057535Z collecting ... collected 58 items / 19 deselected / 39 selected
2025-12-04T10:11:57.6057628Z stepcurrent: skipping 19 already run items.
2025-12-04T10:11:57.6057696Z Running 39 items in this shard
2025-12-04T10:11:57.6057699Z 
2025-12-04T10:11:57.6058279Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8837s] [  2%]
2025-12-04T10:11:57.6058774Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4820s] [  2%]
2025-12-04T10:11:57.6059255Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4932s] [  2%]
2025-12-04T10:11:57.6059262Z 
2025-12-04T10:11:57.6059348Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6059643Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6059723Z Traceback (most recent call last):
2025-12-04T10:11:57.6060033Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6060109Z     method(*args, **kwargs)
2025-12-04T10:11:57.6060415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6060482Z     method(*args, **kwargs)
2025-12-04T10:11:57.6060775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6060833Z     with policy():
2025-12-04T10:11:57.6061124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6061192Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6062005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6062009Z 
2025-12-04T10:11:57.6062146Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6062707Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6062711Z 
2025-12-04T10:11:57.6062868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6063001Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6063095Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6063453Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6063580Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6063638Z graph_break []
2025-12-04T10:11:57.6063935Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6064012Z Traceback (most recent call last):
2025-12-04T10:11:57.6064311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6064375Z     method(*args, **kwargs)
2025-12-04T10:11:57.6064665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6064732Z     method(*args, **kwargs)
2025-12-04T10:11:57.6065019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6065078Z     with policy():
2025-12-04T10:11:57.6065445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6065511Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6066332Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6066370Z 
2025-12-04T10:11:57.6066497Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6067020Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6067024Z 
2025-12-04T10:11:57.6067184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6067311Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6067409Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6067756Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6067896Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6067953Z graph_break []
2025-12-04T10:11:57.6068077Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6068169Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6068300Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6068648Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6068715Z graph_break []
2025-12-04T10:11:57.6068797Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6069092Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6069206Z Traceback (most recent call last):
2025-12-04T10:11:57.6069503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6069576Z     method(*args, **kwargs)
2025-12-04T10:11:57.6069866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6069929Z     method(*args, **kwargs)
2025-12-04T10:11:57.6070221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6070282Z     with policy():
2025-12-04T10:11:57.6070579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6070643Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6071461Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6071468Z 
2025-12-04T10:11:57.6071597Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6072117Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6072121Z 
2025-12-04T10:11:57.6072345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6072472Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6072561Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6072937Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6073061Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6073120Z graph_break []
2025-12-04T10:11:57.6073255Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6073347Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6073470Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6073808Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6073871Z graph_break []
2025-12-04T10:11:57.6073991Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6074079Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6074208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6074549Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6074606Z graph_break []
2025-12-04T10:11:57.6075100Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml -
2025-12-04T10:11:57.6075199Z =========================== short test summary info ============================
2025-12-04T10:11:57.6076502Z FAILED [0.4932s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6076546Z 
2025-12-04T10:11:57.6076670Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6077192Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6077197Z 
2025-12-04T10:11:57.6077349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6077454Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6077571Z ================== 1 failed, 19 deselected, 2 rerun in 2.88s ===================
2025-12-04T10:11:57.6077631Z Got exit code 1
2025-12-04T10:11:57.6077700Z Retrying single test...
2025-12-04T10:11:57.6077966Z W1204 09:41:44.593000 45070 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6078352Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml
2025-12-04T10:11:57.6078453Z ============================= test session starts ==============================
2025-12-04T10:11:57.6078659Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6078723Z cachedir: .pytest_cache
2025-12-04T10:11:57.6079124Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6079205Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6079275Z configfile: pytest.ini
2025-12-04T10:11:57.6079593Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6079757Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6080379Z stepcurrent: skipping 19 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6080451Z Running 1 items in this shard
2025-12-04T10:11:57.6080454Z 
2025-12-04T10:11:57.6081192Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:41:45.673852341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6081196Z 
2025-12-04T10:11:57.6081493Z [W1204 09:41:54.809742327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6081499Z 
2025-12-04T10:11:57.6081792Z [W1204 09:41:54.809986691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6081797Z 
2025-12-04T10:11:57.6082087Z [W1204 09:41:54.815805031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6082091Z 
2025-12-04T10:11:57.6082382Z [W1204 09:41:54.816403531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6082386Z 
2025-12-04T10:11:57.6082675Z [W1204 09:41:54.816586214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6082679Z 
2025-12-04T10:11:57.6082964Z [W1204 09:41:54.822002736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6082967Z 
2025-12-04T10:11:57.6083258Z [W1204 09:41:54.822535746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6083301Z 
2025-12-04T10:11:57.6083588Z [W1204 09:41:54.822698198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6083592Z 
2025-12-04T10:11:57.6083678Z ('RERUN', {'yellow': True}) [11.0162s] [100%]
2025-12-04T10:11:57.6084403Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:41:56.025971803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6084407Z 
2025-12-04T10:11:57.6084703Z [W1204 09:41:56.026497112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6084710Z 
2025-12-04T10:11:57.6085001Z [W1204 09:41:56.026641925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6085006Z 
2025-12-04T10:11:57.6085297Z [W1204 09:41:56.029614446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6085300Z 
2025-12-04T10:11:57.6085586Z [W1204 09:41:56.030200525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6085589Z 
2025-12-04T10:11:57.6085876Z [W1204 09:41:56.030342298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6085949Z 
2025-12-04T10:11:57.6086237Z [W1204 09:41:56.034936017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6086240Z 
2025-12-04T10:11:57.6086526Z [W1204 09:41:56.035392124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6086577Z 
2025-12-04T10:11:57.6086868Z [W1204 09:41:56.035528347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6086871Z 
2025-12-04T10:11:57.6086949Z ('RERUN', {'yellow': True}) [0.4493s] [100%]
2025-12-04T10:11:57.6087681Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:41:56.473748002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6087687Z 
2025-12-04T10:11:57.6087974Z [W1204 09:41:56.474278791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6087977Z 
2025-12-04T10:11:57.6088266Z [W1204 09:41:56.474419483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6088272Z 
2025-12-04T10:11:57.6088556Z [W1204 09:41:56.477410564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6088560Z 
2025-12-04T10:11:57.6088848Z [W1204 09:41:56.477974154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6088851Z 
2025-12-04T10:11:57.6089137Z [W1204 09:41:56.478116326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6089143Z 
2025-12-04T10:11:57.6089429Z [W1204 09:41:56.482739395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6089439Z 
2025-12-04T10:11:57.6089733Z [W1204 09:41:56.483203563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6089773Z 
2025-12-04T10:11:57.6090060Z [W1204 09:41:56.483345405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6090063Z 
2025-12-04T10:11:57.6090128Z FAILED [0.4462s] [100%]
2025-12-04T10:11:57.6090132Z 
2025-12-04T10:11:57.6090214Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6090516Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6090590Z Traceback (most recent call last):
2025-12-04T10:11:57.6090896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6090965Z     method(*args, **kwargs)
2025-12-04T10:11:57.6091258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6091323Z     method(*args, **kwargs)
2025-12-04T10:11:57.6091621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6091679Z     with policy():
2025-12-04T10:11:57.6091975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6092039Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6092913Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6092925Z 
2025-12-04T10:11:57.6093056Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6093613Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6093617Z 
2025-12-04T10:11:57.6093781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6093908Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6094005Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6094355Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6094486Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6094549Z graph_break []
2025-12-04T10:11:57.6094671Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6095378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6095451Z   if out == self.unknown_value:
2025-12-04T10:11:57.6095743Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6095822Z Traceback (most recent call last):
2025-12-04T10:11:57.6096122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6096189Z     method(*args, **kwargs)
2025-12-04T10:11:57.6096481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6096542Z     method(*args, **kwargs)
2025-12-04T10:11:57.6096835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6096935Z     with policy():
2025-12-04T10:11:57.6097226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6097295Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6098114Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6098120Z 
2025-12-04T10:11:57.6098253Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6098775Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6098782Z 
2025-12-04T10:11:57.6098944Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6099068Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6099160Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6099526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6099654Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6099712Z graph_break []
2025-12-04T10:11:57.6099906Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6100595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6100704Z   if out == self.unknown_value:
2025-12-04T10:11:57.6100825Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6100913Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6101035Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6101375Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6101435Z graph_break []
2025-12-04T10:11:57.6101519Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6101809Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6101883Z Traceback (most recent call last):
2025-12-04T10:11:57.6102187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6102250Z     method(*args, **kwargs)
2025-12-04T10:11:57.6102543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6102609Z     method(*args, **kwargs)
2025-12-04T10:11:57.6102898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6102958Z     with policy():
2025-12-04T10:11:57.6103251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6103321Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6104143Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6104202Z 
2025-12-04T10:11:57.6104330Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6104849Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6104853Z 
2025-12-04T10:11:57.6105006Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6105134Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6105223Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6105567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6105691Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6105747Z graph_break []
2025-12-04T10:11:57.6105873Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6106555Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6106634Z   if out == self.unknown_value:
2025-12-04T10:11:57.6106832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6106921Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6107046Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6107383Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6107476Z graph_break []
2025-12-04T10:11:57.6107603Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6107691Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6107813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6108155Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6108213Z graph_break []
2025-12-04T10:11:57.6108702Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml -
2025-12-04T10:11:57.6108801Z =========================== short test summary info ============================
2025-12-04T10:11:57.6110104Z FAILED [0.4462s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6110110Z 
2025-12-04T10:11:57.6110236Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6110760Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6110764Z 
2025-12-04T10:11:57.6110918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6111058Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6111178Z ================== 1 failed, 57 deselected, 2 rerun in 11.94s ==================
2025-12-04T10:11:57.6111237Z Got exit code 1
2025-12-04T10:11:57.6111307Z Retrying single test...
2025-12-04T10:11:57.6111572Z W1204 09:42:03.120000 45263 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6111955Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml
2025-12-04T10:11:57.6112058Z ============================= test session starts ==============================
2025-12-04T10:11:57.6112270Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6112334Z cachedir: .pytest_cache
2025-12-04T10:11:57.6112643Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6112721Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6112789Z configfile: pytest.ini
2025-12-04T10:11:57.6113102Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6113229Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6113805Z stepcurrent: skipping 19 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6113948Z Running 1 items in this shard
2025-12-04T10:11:57.6113951Z 
2025-12-04T10:11:57.6114686Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:42:04.235878468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6114724Z 
2025-12-04T10:11:57.6115024Z [W1204 09:42:13.430814353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6115027Z 
2025-12-04T10:11:57.6115321Z [W1204 09:42:13.431069627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6115324Z 
2025-12-04T10:11:57.6115613Z [W1204 09:42:13.436926356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6115617Z 
2025-12-04T10:11:57.6115906Z [W1204 09:42:13.437513156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6115913Z 
2025-12-04T10:11:57.6116200Z [W1204 09:42:13.437683089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6116206Z 
2025-12-04T10:11:57.6116490Z [W1204 09:42:13.443252303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6116493Z 
2025-12-04T10:11:57.6116784Z [W1204 09:42:13.443782781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6116787Z 
2025-12-04T10:11:57.6117223Z [W1204 09:42:13.443940394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6117227Z 
2025-12-04T10:11:57.6117321Z ('RERUN', {'yellow': True}) [11.1149s] [100%]
2025-12-04T10:11:57.6118052Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:42:14.656281917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6118139Z 
2025-12-04T10:11:57.6118434Z [W1204 09:42:14.656872107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6118438Z 
2025-12-04T10:11:57.6118725Z [W1204 09:42:14.657029119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6118728Z 
2025-12-04T10:11:57.6119017Z [W1204 09:42:14.659964938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6119022Z 
2025-12-04T10:11:57.6119312Z [W1204 09:42:14.660562619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6119315Z 
2025-12-04T10:11:57.6119599Z [W1204 09:42:14.660711821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6119609Z 
2025-12-04T10:11:57.6119927Z [W1204 09:42:14.665246097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6119930Z 
2025-12-04T10:11:57.6120217Z [W1204 09:42:14.665717705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6120220Z 
2025-12-04T10:11:57.6120511Z [W1204 09:42:14.665879758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6120514Z 
2025-12-04T10:11:57.6120773Z ('RERUN', {'yellow': True}) [0.4554s] [100%]
2025-12-04T10:11:57.6121510Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:42:15.108945923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6121561Z 
2025-12-04T10:11:57.6121853Z [W1204 09:42:15.109473212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6121856Z 
2025-12-04T10:11:57.6122149Z [W1204 09:42:15.109616114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6122152Z 
2025-12-04T10:11:57.6122436Z [W1204 09:42:15.112530143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6122439Z 
2025-12-04T10:11:57.6122733Z [W1204 09:42:15.113085133 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6122740Z 
2025-12-04T10:11:57.6123025Z [W1204 09:42:15.113223065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6123032Z 
2025-12-04T10:11:57.6123317Z [W1204 09:42:15.117642700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6123320Z 
2025-12-04T10:11:57.6123608Z [W1204 09:42:15.118097128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6123611Z 
2025-12-04T10:11:57.6123895Z [W1204 09:42:15.118232350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6123898Z 
2025-12-04T10:11:57.6123966Z FAILED [0.4464s] [100%]
2025-12-04T10:11:57.6123969Z 
2025-12-04T10:11:57.6124055Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6124353Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6124428Z Traceback (most recent call last):
2025-12-04T10:11:57.6124770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6124838Z     method(*args, **kwargs)
2025-12-04T10:11:57.6125130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6125192Z     method(*args, **kwargs)
2025-12-04T10:11:57.6125485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6125544Z     with policy():
2025-12-04T10:11:57.6125842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6125906Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6126718Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6126725Z 
2025-12-04T10:11:57.6126854Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6127376Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6127379Z 
2025-12-04T10:11:57.6127542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6127733Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6127827Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6128182Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6128361Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6128425Z graph_break []
2025-12-04T10:11:57.6128548Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6129241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6129317Z   if out == self.unknown_value:
2025-12-04T10:11:57.6129608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6129684Z Traceback (most recent call last):
2025-12-04T10:11:57.6129979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6130044Z     method(*args, **kwargs)
2025-12-04T10:11:57.6130337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6130399Z     method(*args, **kwargs)
2025-12-04T10:11:57.6130686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6130749Z     with policy():
2025-12-04T10:11:57.6131040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6131111Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6131935Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6131977Z 
2025-12-04T10:11:57.6132107Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6132630Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6132633Z 
2025-12-04T10:11:57.6132788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6132917Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6133008Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6133364Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6133490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6133550Z graph_break []
2025-12-04T10:11:57.6133675Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6134362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6134434Z   if out == self.unknown_value:
2025-12-04T10:11:57.6134563Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6134652Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6134849Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6135193Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6135283Z graph_break []
2025-12-04T10:11:57.6135371Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6135659Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6135735Z Traceback (most recent call last):
2025-12-04T10:11:57.6136043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6136111Z     method(*args, **kwargs)
2025-12-04T10:11:57.6136407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6136473Z     method(*args, **kwargs)
2025-12-04T10:11:57.6136758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6136821Z     with policy():
2025-12-04T10:11:57.6137111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6137185Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6138006Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6138010Z 
2025-12-04T10:11:57.6138132Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6138661Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6138665Z 
2025-12-04T10:11:57.6138819Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6138987Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6139078Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6139417Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6139544Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6139602Z graph_break []
2025-12-04T10:11:57.6139729Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6140417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6140486Z   if out == self.unknown_value:
2025-12-04T10:11:57.6140615Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6140706Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6140830Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6141168Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6141226Z graph_break []
2025-12-04T10:11:57.6141352Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6141437Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6141631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6141977Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6142068Z graph_break []
2025-12-04T10:11:57.6142564Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml -
2025-12-04T10:11:57.6142664Z =========================== short test summary info ============================
2025-12-04T10:11:57.6143966Z FAILED [0.4464s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6143971Z 
2025-12-04T10:11:57.6144091Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6144611Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6144621Z 
2025-12-04T10:11:57.6144777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6144878Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6144996Z ================== 1 failed, 57 deselected, 2 rerun in 12.04s ==================
2025-12-04T10:11:57.6145054Z Got exit code 1
2025-12-04T10:11:57.6145529Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6145779Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6146044Z W1204 09:42:21.748000 45456 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6146472Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml
2025-12-04T10:11:57.6146567Z ============================= test session starts ==============================
2025-12-04T10:11:57.6146773Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6146857Z cachedir: .pytest_cache
2025-12-04T10:11:57.6147163Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6147244Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6147308Z configfile: pytest.ini
2025-12-04T10:11:57.6147624Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6147755Z collecting ... collected 58 items / 20 deselected / 38 selected
2025-12-04T10:11:57.6147843Z stepcurrent: skipping 20 already run items.
2025-12-04T10:11:57.6147912Z Running 38 items in this shard
2025-12-04T10:11:57.6147916Z 
2025-12-04T10:11:57.6148412Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8691s] [  2%]
2025-12-04T10:11:57.6148892Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4514s] [  2%]
2025-12-04T10:11:57.6149404Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4366s] [  2%]
2025-12-04T10:11:57.6149409Z 
2025-12-04T10:11:57.6149493Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6149826Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6149899Z Traceback (most recent call last):
2025-12-04T10:11:57.6150199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6150268Z     method(*args, **kwargs)
2025-12-04T10:11:57.6150557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6150621Z     method(*args, **kwargs)
2025-12-04T10:11:57.6150914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6150973Z     with policy():
2025-12-04T10:11:57.6151273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6151341Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6152131Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6152139Z 
2025-12-04T10:11:57.6152265Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6152784Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6152787Z 
2025-12-04T10:11:57.6152947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6153072Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6153171Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6153557Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6153697Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6153762Z graph_break []
2025-12-04T10:11:57.6154050Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6154121Z Traceback (most recent call last):
2025-12-04T10:11:57.6154427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6154491Z     method(*args, **kwargs)
2025-12-04T10:11:57.6154791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6154855Z     method(*args, **kwargs)
2025-12-04T10:11:57.6155148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6155212Z     with policy():
2025-12-04T10:11:57.6155506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6155572Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6156470Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6156475Z 
2025-12-04T10:11:57.6156610Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6157129Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6157171Z 
2025-12-04T10:11:57.6157327Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6157457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6157551Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6157896Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6158033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6158093Z graph_break []
2025-12-04T10:11:57.6158218Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6158310Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6158442Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6158791Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6158847Z graph_break []
2025-12-04T10:11:57.6158929Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6159222Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6159294Z Traceback (most recent call last):
2025-12-04T10:11:57.6159599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6159662Z     method(*args, **kwargs)
2025-12-04T10:11:57.6160002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6160072Z     method(*args, **kwargs)
2025-12-04T10:11:57.6160406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6160466Z     with policy():
2025-12-04T10:11:57.6160765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6160833Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6161646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6161650Z 
2025-12-04T10:11:57.6161772Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6162292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6162299Z 
2025-12-04T10:11:57.6162452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6162575Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6162672Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6163009Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6163209Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6163270Z graph_break []
2025-12-04T10:11:57.6163394Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6163482Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6163638Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6163979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6164039Z graph_break []
2025-12-04T10:11:57.6164160Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6164250Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6164371Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6164711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6164772Z graph_break []
2025-12-04T10:11:57.6165258Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml -
2025-12-04T10:11:57.6165358Z =========================== short test summary info ============================
2025-12-04T10:11:57.6166644Z FAILED [0.4366s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6166648Z 
2025-12-04T10:11:57.6166772Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6167288Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6167329Z 
2025-12-04T10:11:57.6167479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6167586Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6167703Z ================== 1 failed, 20 deselected, 2 rerun in 2.78s ===================
2025-12-04T10:11:57.6167761Z Got exit code 1
2025-12-04T10:11:57.6167828Z Retrying single test...
2025-12-04T10:11:57.6168090Z W1204 09:42:31.462000 45637 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6168482Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml
2025-12-04T10:11:57.6168576Z ============================= test session starts ==============================
2025-12-04T10:11:57.6168780Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6168864Z cachedir: .pytest_cache
2025-12-04T10:11:57.6169170Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6169251Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6169315Z configfile: pytest.ini
2025-12-04T10:11:57.6169629Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6169760Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6170394Z stepcurrent: skipping 20 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6170465Z Running 1 items in this shard
2025-12-04T10:11:57.6170474Z 
2025-12-04T10:11:57.6171197Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:32.515644840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6171237Z 
2025-12-04T10:11:57.6171537Z [W1204 09:42:41.621200903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6171540Z 
2025-12-04T10:11:57.6171835Z [W1204 09:42:41.621457927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6171838Z 
2025-12-04T10:11:57.6172128Z [W1204 09:42:41.627336527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6172131Z 
2025-12-04T10:11:57.6172422Z [W1204 09:42:41.627913067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6172428Z 
2025-12-04T10:11:57.6172713Z [W1204 09:42:41.628091010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6172716Z 
2025-12-04T10:11:57.6173005Z [W1204 09:42:41.633645025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6173008Z 
2025-12-04T10:11:57.6173299Z [W1204 09:42:41.634200354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6173303Z 
2025-12-04T10:11:57.6173595Z [W1204 09:42:41.634377008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6173598Z 
2025-12-04T10:11:57.6173678Z ('RERUN', {'yellow': True}) [10.9655s] [100%]
2025-12-04T10:11:57.6174396Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:42.808326756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6174441Z 
2025-12-04T10:11:57.6174730Z [W1204 09:42:42.808905816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6174733Z 
2025-12-04T10:11:57.6175022Z [W1204 09:42:42.809051988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6175025Z 
2025-12-04T10:11:57.6175317Z [W1204 09:42:42.812024919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6175321Z 
2025-12-04T10:11:57.6175606Z [W1204 09:42:42.812598389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6175612Z 
2025-12-04T10:11:57.6175901Z [W1204 09:42:42.812738631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6175904Z 
2025-12-04T10:11:57.6176190Z [W1204 09:42:42.817269319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6176194Z 
2025-12-04T10:11:57.6176487Z [W1204 09:42:42.817727867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6176491Z 
2025-12-04T10:11:57.6176842Z [W1204 09:42:42.817865879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6176846Z 
2025-12-04T10:11:57.6176930Z ('RERUN', {'yellow': True}) [0.4159s] [100%]
2025-12-04T10:11:57.6177642Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:43.222606118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6177681Z 
2025-12-04T10:11:57.6177965Z [W1204 09:42:43.223180637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6177968Z 
2025-12-04T10:11:57.6178260Z [W1204 09:42:43.223328410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6178263Z 
2025-12-04T10:11:57.6178556Z [W1204 09:42:43.226272680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6178559Z 
2025-12-04T10:11:57.6178849Z [W1204 09:42:43.226827100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6178853Z 
2025-12-04T10:11:57.6179142Z [W1204 09:42:43.226969232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6179146Z 
2025-12-04T10:11:57.6179434Z [W1204 09:42:43.231474489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6179437Z 
2025-12-04T10:11:57.6179735Z [W1204 09:42:43.231934857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6179738Z 
2025-12-04T10:11:57.6180034Z [W1204 09:42:43.232073049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6180037Z 
2025-12-04T10:11:57.6180097Z FAILED [0.4135s] [100%]
2025-12-04T10:11:57.6180101Z 
2025-12-04T10:11:57.6180184Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6180494Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6180606Z Traceback (most recent call last):
2025-12-04T10:11:57.6180914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6180979Z     method(*args, **kwargs)
2025-12-04T10:11:57.6181480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6187853Z     method(*args, **kwargs)
2025-12-04T10:11:57.6188257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6188327Z     with policy():
2025-12-04T10:11:57.6188655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6188736Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6189557Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6189562Z 
2025-12-04T10:11:57.6189704Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6190239Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6190364Z 
2025-12-04T10:11:57.6190542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6190681Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6190783Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6191184Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6191321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6191386Z graph_break []
2025-12-04T10:11:57.6191517Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6192226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6192312Z   if out == self.unknown_value:
2025-12-04T10:11:57.6192619Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6192704Z Traceback (most recent call last):
2025-12-04T10:11:57.6193013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6193081Z     method(*args, **kwargs)
2025-12-04T10:11:57.6193378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6193443Z     method(*args, **kwargs)
2025-12-04T10:11:57.6193730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6193796Z     with policy():
2025-12-04T10:11:57.6194093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6194164Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6194978Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6195024Z 
2025-12-04T10:11:57.6195160Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6195689Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6195693Z 
2025-12-04T10:11:57.6195856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6195994Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6196093Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6196442Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6196581Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6196640Z graph_break []
2025-12-04T10:11:57.6196770Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6197472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6197544Z   if out == self.unknown_value:
2025-12-04T10:11:57.6197741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6197834Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6197960Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6198304Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6198397Z graph_break []
2025-12-04T10:11:57.6198484Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6198779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6198857Z Traceback (most recent call last):
2025-12-04T10:11:57.6199161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6199227Z     method(*args, **kwargs)
2025-12-04T10:11:57.6199523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6199587Z     method(*args, **kwargs)
2025-12-04T10:11:57.6199963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6200033Z     with policy():
2025-12-04T10:11:57.6200330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6200398Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6201355Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6201360Z 
2025-12-04T10:11:57.6201497Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6202024Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6202069Z 
2025-12-04T10:11:57.6202231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6202363Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6202456Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6202815Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6202949Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6203009Z graph_break []
2025-12-04T10:11:57.6203143Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6203832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6203906Z   if out == self.unknown_value:
2025-12-04T10:11:57.6204037Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6204127Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6204250Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6204593Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6204651Z graph_break []
2025-12-04T10:11:57.6204779Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6204933Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6205054Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6205396Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6205488Z graph_break []
2025-12-04T10:11:57.6205985Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml -
2025-12-04T10:11:57.6206087Z =========================== short test summary info ============================
2025-12-04T10:11:57.6207380Z FAILED [0.4135s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6207391Z 
2025-12-04T10:11:57.6207516Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6208038Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6208042Z 
2025-12-04T10:11:57.6208204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6208305Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6208426Z ================== 1 failed, 57 deselected, 2 rerun in 11.82s ==================
2025-12-04T10:11:57.6208487Z Got exit code 1
2025-12-04T10:11:57.6208554Z Retrying single test...
2025-12-04T10:11:57.6208820Z W1204 09:42:49.902000 45823 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6209208Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml
2025-12-04T10:11:57.6209343Z ============================= test session starts ==============================
2025-12-04T10:11:57.6209558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6209625Z cachedir: .pytest_cache
2025-12-04T10:11:57.6209940Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6210017Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6210082Z configfile: pytest.ini
2025-12-04T10:11:57.6210406Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6210535Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6211110Z stepcurrent: skipping 20 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6211184Z Running 1 items in this shard
2025-12-04T10:11:57.6211187Z 
2025-12-04T10:11:57.6211914Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:42:51.957457567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6211926Z 
2025-12-04T10:11:57.6212289Z [W1204 09:43:00.086688388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6212293Z 
2025-12-04T10:11:57.6212583Z [W1204 09:43:00.086952732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6212587Z 
2025-12-04T10:11:57.6212930Z [W1204 09:43:00.092777102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6212935Z 
2025-12-04T10:11:57.6213221Z [W1204 09:43:00.093358401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6213224Z 
2025-12-04T10:11:57.6213525Z [W1204 09:43:00.093540615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6213528Z 
2025-12-04T10:11:57.6213816Z [W1204 09:43:00.098864275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6213822Z 
2025-12-04T10:11:57.6214110Z [W1204 09:43:00.099412175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6214114Z 
2025-12-04T10:11:57.6214400Z [W1204 09:43:00.099586028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6214406Z 
2025-12-04T10:11:57.6214492Z ('RERUN', {'yellow': True}) [10.9920s] [100%]
2025-12-04T10:11:57.6215207Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:43:01.274102924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6215211Z 
2025-12-04T10:11:57.6215502Z [W1204 09:43:01.274665314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6215508Z 
2025-12-04T10:11:57.6215798Z [W1204 09:43:01.274806496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6215802Z 
2025-12-04T10:11:57.6216088Z [W1204 09:43:01.277739646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6216131Z 
2025-12-04T10:11:57.6216421Z [W1204 09:43:01.278302666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6216425Z 
2025-12-04T10:11:57.6216710Z [W1204 09:43:01.278443288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6216714Z 
2025-12-04T10:11:57.6217193Z [W1204 09:43:01.282970466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6217197Z 
2025-12-04T10:11:57.6217501Z [W1204 09:43:01.283430913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6217505Z 
2025-12-04T10:11:57.6217796Z [W1204 09:43:01.283568816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6217803Z 
2025-12-04T10:11:57.6217882Z ('RERUN', {'yellow': True}) [0.4112s] [100%]
2025-12-04T10:11:57.6218599Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:43:01.682474488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6218609Z 
2025-12-04T10:11:57.6218894Z [W1204 09:43:01.683048378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6219020Z 
2025-12-04T10:11:57.6219312Z [W1204 09:43:01.683189090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6219315Z 
2025-12-04T10:11:57.6219604Z [W1204 09:43:01.686098840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6219655Z 
2025-12-04T10:11:57.6219941Z [W1204 09:43:01.686647129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6219944Z 
2025-12-04T10:11:57.6220232Z [W1204 09:43:01.686785582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6220235Z 
2025-12-04T10:11:57.6220517Z [W1204 09:43:01.691297199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6220520Z 
2025-12-04T10:11:57.6220812Z [W1204 09:43:01.691752176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6220815Z 
2025-12-04T10:11:57.6221101Z [W1204 09:43:01.691888469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6221107Z 
2025-12-04T10:11:57.6221172Z FAILED [0.4084s] [100%]
2025-12-04T10:11:57.6221175Z 
2025-12-04T10:11:57.6221260Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6221555Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6221633Z Traceback (most recent call last):
2025-12-04T10:11:57.6221945Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6222014Z     method(*args, **kwargs)
2025-12-04T10:11:57.6222311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6222374Z     method(*args, **kwargs)
2025-12-04T10:11:57.6222675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6222789Z     with policy():
2025-12-04T10:11:57.6223082Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6223153Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6223948Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6223953Z 
2025-12-04T10:11:57.6224086Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6224609Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6224616Z 
2025-12-04T10:11:57.6224778Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6224911Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6225005Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6225354Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6225480Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6225539Z graph_break []
2025-12-04T10:11:57.6226052Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6226749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6226868Z   if out == self.unknown_value:
2025-12-04T10:11:57.6227168Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6227242Z Traceback (most recent call last):
2025-12-04T10:11:57.6227541Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6227604Z     method(*args, **kwargs)
2025-12-04T10:11:57.6227892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6227958Z     method(*args, **kwargs)
2025-12-04T10:11:57.6228246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6228310Z     with policy():
2025-12-04T10:11:57.6228601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6228670Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6229480Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6229484Z 
2025-12-04T10:11:57.6229611Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6230138Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6230142Z 
2025-12-04T10:11:57.6230299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6230430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6230573Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6230924Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6231052Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6231111Z graph_break []
2025-12-04T10:11:57.6231235Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6231939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6232008Z   if out == self.unknown_value:
2025-12-04T10:11:57.6232135Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6232228Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6232353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6232697Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6232754Z graph_break []
2025-12-04T10:11:57.6232838Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6233130Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6233278Z Traceback (most recent call last):
2025-12-04T10:11:57.6233580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6233644Z     method(*args, **kwargs)
2025-12-04T10:11:57.6233962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6234031Z     method(*args, **kwargs)
2025-12-04T10:11:57.6234317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6234380Z     with policy():
2025-12-04T10:11:57.6234668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6234733Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6235547Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6235551Z 
2025-12-04T10:11:57.6235676Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6236197Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6236200Z 
2025-12-04T10:11:57.6236353Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6236477Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6236573Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6236915Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6237043Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6237100Z graph_break []
2025-12-04T10:11:57.6237221Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6237955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6238022Z   if out == self.unknown_value:
2025-12-04T10:11:57.6238150Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6238239Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6238360Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6238706Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6238764Z graph_break []
2025-12-04T10:11:57.6238886Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6238983Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6239106Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6239449Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6239517Z graph_break []
2025-12-04T10:11:57.6240050Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml -
2025-12-04T10:11:57.6240243Z =========================== short test summary info ============================
2025-12-04T10:11:57.6241546Z FAILED [0.4084s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6241586Z 
2025-12-04T10:11:57.6241715Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6242234Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6242238Z 
2025-12-04T10:11:57.6242399Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6242505Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6242621Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ==================
2025-12-04T10:11:57.6242694Z Got exit code 1
2025-12-04T10:11:57.6243168Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6243421Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6243694Z W1204 09:43:08.326000 46009 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6244084Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml
2025-12-04T10:11:57.6244186Z ============================= test session starts ==============================
2025-12-04T10:11:57.6244396Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6244481Z cachedir: .pytest_cache
2025-12-04T10:11:57.6244789Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6244904Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6244973Z configfile: pytest.ini
2025-12-04T10:11:57.6245288Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6245416Z collecting ... collected 58 items / 21 deselected / 37 selected
2025-12-04T10:11:57.6245511Z stepcurrent: skipping 21 already run items.
2025-12-04T10:11:57.6245579Z Running 37 items in this shard
2025-12-04T10:11:57.6245583Z 
2025-12-04T10:11:57.6246091Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9485s] [  2%]
2025-12-04T10:11:57.6246579Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5423s] [  2%]
2025-12-04T10:11:57.6247027Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5353s] [  2%]
2025-12-04T10:11:57.6247031Z 
2025-12-04T10:11:57.6247114Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6247401Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6247480Z Traceback (most recent call last):
2025-12-04T10:11:57.6247851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6247916Z     method(*args, **kwargs)
2025-12-04T10:11:57.6248209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6248308Z     method(*args, **kwargs)
2025-12-04T10:11:57.6248596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6248656Z     with policy():
2025-12-04T10:11:57.6248947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6249017Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6249810Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6249814Z 
2025-12-04T10:11:57.6249943Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6250464Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6250470Z 
2025-12-04T10:11:57.6250629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6250753Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6250848Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6251399Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6251532Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6251590Z graph_break []
2025-12-04T10:11:57.6251901Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6252021Z Traceback (most recent call last):
2025-12-04T10:11:57.6252335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6252405Z     method(*args, **kwargs)
2025-12-04T10:11:57.6252698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6252767Z     method(*args, **kwargs)
2025-12-04T10:11:57.6253054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6253121Z     with policy():
2025-12-04T10:11:57.6253419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6253486Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6254294Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6254301Z 
2025-12-04T10:11:57.6254434Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6254967Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6254970Z 
2025-12-04T10:11:57.6255198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6255333Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6255437Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6255981Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6256150Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6256209Z graph_break []
2025-12-04T10:11:57.6256336Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6256431Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6256554Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6257098Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6257158Z graph_break []
2025-12-04T10:11:57.6257246Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6257547Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6257624Z Traceback (most recent call last):
2025-12-04T10:11:57.6257922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6257991Z     method(*args, **kwargs)
2025-12-04T10:11:57.6258280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6258351Z     method(*args, **kwargs)
2025-12-04T10:11:57.6258638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6258699Z     with policy():
2025-12-04T10:11:57.6258993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6259098Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6259906Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6259911Z 
2025-12-04T10:11:57.6260040Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6260563Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6260567Z 
2025-12-04T10:11:57.6260732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6260861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6260960Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6261495Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6261623Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6261685Z graph_break []
2025-12-04T10:11:57.6261808Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6261903Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6262086Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6262621Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6262723Z graph_break []
2025-12-04T10:11:57.6262850Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6262937Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6263062Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6263597Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6263661Z graph_break []
2025-12-04T10:11:57.6264157Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml -
2025-12-04T10:11:57.6264273Z =========================== short test summary info ============================
2025-12-04T10:11:57.6265564Z FAILED [0.5353s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6265568Z 
2025-12-04T10:11:57.6265697Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6266229Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6266233Z 
2025-12-04T10:11:57.6266390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6266536Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6266653Z ================== 1 failed, 21 deselected, 2 rerun in 3.05s ===================
2025-12-04T10:11:57.6266714Z Got exit code 1
2025-12-04T10:11:57.6266783Z Retrying single test...
2025-12-04T10:11:57.6267044Z W1204 09:43:18.011000 46191 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6267440Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml
2025-12-04T10:11:57.6267538Z ============================= test session starts ==============================
2025-12-04T10:11:57.6267747Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6267817Z cachedir: .pytest_cache
2025-12-04T10:11:57.6268120Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6268208Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6268274Z configfile: pytest.ini
2025-12-04T10:11:57.6268593Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6268727Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6269292Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6269446Z Running 1 items in this shard
2025-12-04T10:11:57.6269455Z 
2025-12-04T10:11:57.6270182Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:19.611299086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6270224Z 
2025-12-04T10:11:57.6270524Z [W1204 09:43:28.666942859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6270528Z 
2025-12-04T10:11:57.6270820Z [W1204 09:43:28.667200783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6270823Z 
2025-12-04T10:11:57.6271109Z [W1204 09:43:28.673340288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6271112Z 
2025-12-04T10:11:57.6271405Z [W1204 09:43:28.673991619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6271408Z 
2025-12-04T10:11:57.6271691Z [W1204 09:43:28.674171112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6271697Z 
2025-12-04T10:11:57.6271985Z [W1204 09:43:28.679777108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6271988Z 
2025-12-04T10:11:57.6272273Z [W1204 09:43:28.680356108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6272276Z 
2025-12-04T10:11:57.6272564Z [W1204 09:43:28.680529461 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6272568Z 
2025-12-04T10:11:57.6272654Z ('RERUN', {'yellow': True}) [11.0061s] [100%]
2025-12-04T10:11:57.6273371Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:29.475063069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6273414Z 
2025-12-04T10:11:57.6273702Z [W1204 09:43:29.475604259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6273705Z 
2025-12-04T10:11:57.6273988Z [W1204 09:43:29.475741531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6273991Z 
2025-12-04T10:11:57.6274279Z [W1204 09:43:29.478604750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6274282Z 
2025-12-04T10:11:57.6274573Z [W1204 09:43:29.479048837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6274576Z 
2025-12-04T10:11:57.6274867Z [W1204 09:43:29.479185610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6274872Z 
2025-12-04T10:11:57.6275161Z [W1204 09:43:29.483671307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6275164Z 
2025-12-04T10:11:57.6275454Z [W1204 09:43:29.484138415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6275457Z 
2025-12-04T10:11:57.6275755Z [W1204 09:43:29.484275387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6275759Z 
2025-12-04T10:11:57.6275911Z ('RERUN', {'yellow': True}) [0.4949s] [100%]
2025-12-04T10:11:57.6276630Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:30.969627815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6276668Z 
2025-12-04T10:11:57.6276955Z [W1204 09:43:30.970201705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6276962Z 
2025-12-04T10:11:57.6277249Z [W1204 09:43:30.970346997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6277252Z 
2025-12-04T10:11:57.6277536Z [W1204 09:43:30.973252127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6277540Z 
2025-12-04T10:11:57.6277832Z [W1204 09:43:30.973704675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6277836Z 
2025-12-04T10:11:57.6278121Z [W1204 09:43:30.973845537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6278127Z 
2025-12-04T10:11:57.6278420Z [W1204 09:43:30.978381084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6278423Z 
2025-12-04T10:11:57.6278707Z [W1204 09:43:30.978840542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6278718Z 
2025-12-04T10:11:57.6279003Z [W1204 09:43:30.978979215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6279007Z 
2025-12-04T10:11:57.6279070Z FAILED [0.4894s] [100%]
2025-12-04T10:11:57.6279076Z 
2025-12-04T10:11:57.6279167Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6279459Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6279541Z Traceback (most recent call last):
2025-12-04T10:11:57.6279934Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6280002Z     method(*args, **kwargs)
2025-12-04T10:11:57.6280306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6280369Z     method(*args, **kwargs)
2025-12-04T10:11:57.6280658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6280721Z     with policy():
2025-12-04T10:11:57.6281019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6281089Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6281886Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6281893Z 
2025-12-04T10:11:57.6282025Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6282546Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6282549Z 
2025-12-04T10:11:57.6282712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6282915Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6283014Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6283565Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6283731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6283790Z graph_break []
2025-12-04T10:11:57.6283918Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6284607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6284685Z   if out == self.unknown_value:
2025-12-04T10:11:57.6284981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6285056Z Traceback (most recent call last):
2025-12-04T10:11:57.6285355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6285424Z     method(*args, **kwargs)
2025-12-04T10:11:57.6285713Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6285779Z     method(*args, **kwargs)
2025-12-04T10:11:57.6286064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6286126Z     with policy():
2025-12-04T10:11:57.6286419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6286488Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6287294Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6287337Z 
2025-12-04T10:11:57.6287477Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6287998Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6288002Z 
2025-12-04T10:11:57.6288160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6288287Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6288387Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6288932Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6289065Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6289123Z graph_break []
2025-12-04T10:11:57.6289247Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6289936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6290007Z   if out == self.unknown_value:
2025-12-04T10:11:57.6290201Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6290294Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6290416Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6290956Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6291051Z graph_break []
2025-12-04T10:11:57.6291139Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6291428Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6291503Z Traceback (most recent call last):
2025-12-04T10:11:57.6291801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6291867Z     method(*args, **kwargs)
2025-12-04T10:11:57.6292155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6292222Z     method(*args, **kwargs)
2025-12-04T10:11:57.6292511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6292576Z     with policy():
2025-12-04T10:11:57.6292869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6292936Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6293748Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6293756Z 
2025-12-04T10:11:57.6293883Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6294408Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6294465Z 
2025-12-04T10:11:57.6294622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6294750Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6294843Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6295382Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6295514Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6295572Z graph_break []
2025-12-04T10:11:57.6295707Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6296401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6296474Z   if out == self.unknown_value:
2025-12-04T10:11:57.6296601Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6296690Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6296811Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6297418Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6297483Z graph_break []
2025-12-04T10:11:57.6297608Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6297697Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6297855Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6298398Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6298457Z graph_break []
2025-12-04T10:11:57.6298948Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml -
2025-12-04T10:11:57.6299058Z =========================== short test summary info ============================
2025-12-04T10:11:57.6300337Z FAILED [0.4894s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6300351Z 
2025-12-04T10:11:57.6300475Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6300995Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6300999Z 
2025-12-04T10:11:57.6301162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6301268Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6301387Z ================== 1 failed, 57 deselected, 2 rerun in 12.01s ==================
2025-12-04T10:11:57.6301451Z Got exit code 1
2025-12-04T10:11:57.6301552Z Retrying single test...
2025-12-04T10:11:57.6301821Z W1204 09:43:36.589000 46377 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6302211Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml
2025-12-04T10:11:57.6302306Z ============================= test session starts ==============================
2025-12-04T10:11:57.6302519Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6302586Z cachedir: .pytest_cache
2025-12-04T10:11:57.6302896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6302973Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6303040Z configfile: pytest.ini
2025-12-04T10:11:57.6303358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6303489Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6304061Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6304136Z Running 1 items in this shard
2025-12-04T10:11:57.6304140Z 
2025-12-04T10:11:57.6304928Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:38.196651915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6304933Z 
2025-12-04T10:11:57.6305232Z [W1204 09:43:47.181403929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6305272Z 
2025-12-04T10:11:57.6305559Z [W1204 09:43:47.181675273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6305563Z 
2025-12-04T10:11:57.6305850Z [W1204 09:43:47.187821698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6305854Z 
2025-12-04T10:11:57.6306138Z [W1204 09:43:47.188432369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6306141Z 
2025-12-04T10:11:57.6306434Z [W1204 09:43:47.188618582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6306437Z 
2025-12-04T10:11:57.6306723Z [W1204 09:43:47.194156357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6306729Z 
2025-12-04T10:11:57.6307017Z [W1204 09:43:47.194699746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6307021Z 
2025-12-04T10:11:57.6307304Z [W1204 09:43:47.194861899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6307307Z 
2025-12-04T10:11:57.6307389Z ('RERUN', {'yellow': True}) [10.9491s] [100%]
2025-12-04T10:11:57.6308110Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:48.999467135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6308113Z 
2025-12-04T10:11:57.6308400Z [W1204 09:43:48.000018844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6308440Z 
2025-12-04T10:11:57.6308729Z [W1204 09:43:48.000166587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6308732Z 
2025-12-04T10:11:57.6309020Z [W1204 09:43:48.003081506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6309023Z 
2025-12-04T10:11:57.6309310Z [W1204 09:43:48.003531734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6309313Z 
2025-12-04T10:11:57.6309599Z [W1204 09:43:48.003670716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6309602Z 
2025-12-04T10:11:57.6309893Z [W1204 09:43:48.008175393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6309897Z 
2025-12-04T10:11:57.6310181Z [W1204 09:43:48.008639441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6310184Z 
2025-12-04T10:11:57.6310477Z [W1204 09:43:48.008776924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6310480Z 
2025-12-04T10:11:57.6310562Z ('RERUN', {'yellow': True}) [0.5063s] [100%]
2025-12-04T10:11:57.6311339Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:43:48.505779800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6311343Z 
2025-12-04T10:11:57.6311636Z [W1204 09:43:48.506315949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6311673Z 
2025-12-04T10:11:57.6311958Z [W1204 09:43:48.506456432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6311961Z 
2025-12-04T10:11:57.6312248Z [W1204 09:43:48.509360101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6312251Z 
2025-12-04T10:11:57.6312535Z [W1204 09:43:48.509810839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6312538Z 
2025-12-04T10:11:57.6312828Z [W1204 09:43:48.509954611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6312831Z 
2025-12-04T10:11:57.6313127Z [W1204 09:43:48.514430778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6313132Z 
2025-12-04T10:11:57.6313425Z [W1204 09:43:48.514887916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6313428Z 
2025-12-04T10:11:57.6313713Z [W1204 09:43:48.515024878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6313716Z 
2025-12-04T10:11:57.6313778Z FAILED [0.5022s] [100%]
2025-12-04T10:11:57.6313781Z 
2025-12-04T10:11:57.6313868Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6314163Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6314242Z Traceback (most recent call last):
2025-12-04T10:11:57.6314548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6314613Z     method(*args, **kwargs)
2025-12-04T10:11:57.6314911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6315013Z     method(*args, **kwargs)
2025-12-04T10:11:57.6315307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6315366Z     with policy():
2025-12-04T10:11:57.6315657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6315728Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6316527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6316532Z 
2025-12-04T10:11:57.6316664Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6317491Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6317496Z 
2025-12-04T10:11:57.6317670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6317807Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6317903Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6318582Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6318722Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6318848Z graph_break []
2025-12-04T10:11:57.6318981Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6319675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6319755Z   if out == self.unknown_value:
2025-12-04T10:11:57.6320089Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6320167Z Traceback (most recent call last):
2025-12-04T10:11:57.6320473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6320539Z     method(*args, **kwargs)
2025-12-04T10:11:57.6320826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6320897Z     method(*args, **kwargs)
2025-12-04T10:11:57.6321180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6321244Z     with policy():
2025-12-04T10:11:57.6321546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6321612Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6322426Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6322430Z 
2025-12-04T10:11:57.6322558Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6323077Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6323136Z 
2025-12-04T10:11:57.6323296Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6323427Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6323519Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6324062Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6324200Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6324259Z graph_break []
2025-12-04T10:11:57.6324384Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6325074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6325154Z   if out == self.unknown_value:
2025-12-04T10:11:57.6325284Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6325375Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6325499Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6326105Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6326166Z graph_break []
2025-12-04T10:11:57.6326289Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6326586Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6326661Z Traceback (most recent call last):
2025-12-04T10:11:57.6326977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6327042Z     method(*args, **kwargs)
2025-12-04T10:11:57.6327337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6327406Z     method(*args, **kwargs)
2025-12-04T10:11:57.6327698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6327765Z     with policy():
2025-12-04T10:11:57.6328066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6328139Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6328965Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6328969Z 
2025-12-04T10:11:57.6329096Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6329623Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6330239Z 
2025-12-04T10:11:57.6330408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6330783Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6331299Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6332036Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6332813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6333084Z graph_break []
2025-12-04T10:11:57.6333309Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6334226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6335059Z   if out == self.unknown_value:
2025-12-04T10:11:57.6335315Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6335623Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6335924Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6336666Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6337347Z graph_break []
2025-12-04T10:11:57.6337569Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6337959Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6338252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6338995Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6339710Z graph_break []
2025-12-04T10:11:57.6340291Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml -
2025-12-04T10:11:57.6340963Z =========================== short test summary info ============================
2025-12-04T10:11:57.6342444Z FAILED [0.5022s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6343795Z 
2025-12-04T10:11:57.6343926Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6344661Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6345252Z 
2025-12-04T10:11:57.6345417Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6345755Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6346057Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ==================
2025-12-04T10:11:57.6346329Z Got exit code 1
2025-12-04T10:11:57.6346905Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6347695Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6348331Z W1204 09:43:55.153000 46564 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6349053Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml
2025-12-04T10:11:57.6349611Z ============================= test session starts ==============================
2025-12-04T10:11:57.6349993Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6350346Z cachedir: .pytest_cache
2025-12-04T10:11:57.6350764Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6351220Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6351430Z configfile: pytest.ini
2025-12-04T10:11:57.6351859Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6352383Z collecting ... collected 58 items / 22 deselected / 36 selected
2025-12-04T10:11:57.6352672Z stepcurrent: skipping 22 already run items.
2025-12-04T10:11:57.6352899Z Running 36 items in this shard
2025-12-04T10:11:57.6353025Z 
2025-12-04T10:11:57.6353535Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0162s] [  2%]
2025-12-04T10:11:57.6354677Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6129s] [  2%]
2025-12-04T10:11:57.6355694Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6145s] [  2%]
2025-12-04T10:11:57.6356259Z 
2025-12-04T10:11:57.6356347Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6356811Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6357258Z Traceback (most recent call last):
2025-12-04T10:11:57.6357716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6358172Z     method(*args, **kwargs)
2025-12-04T10:11:57.6358586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6359025Z     method(*args, **kwargs)
2025-12-04T10:11:57.6359426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6359858Z     with policy():
2025-12-04T10:11:57.6360386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6360824Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6361753Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6362639Z 
2025-12-04T10:11:57.6362769Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6363501Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6364097Z 
2025-12-04T10:11:57.6364263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6364695Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6365001Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6365526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6366084Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6366350Z graph_break []
2025-12-04T10:11:57.6366743Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6367186Z Traceback (most recent call last):
2025-12-04T10:11:57.6367627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6368057Z     method(*args, **kwargs)
2025-12-04T10:11:57.6368477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6368910Z     method(*args, **kwargs)
2025-12-04T10:11:57.6369312Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6369734Z     with policy():
2025-12-04T10:11:57.6370122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6370560Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6371591Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6372523Z 
2025-12-04T10:11:57.6372652Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6373378Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6373982Z 
2025-12-04T10:11:57.6374141Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6374504Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6374803Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6375326Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6375875Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6376144Z graph_break []
2025-12-04T10:11:57.6376365Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6376670Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6376962Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6377508Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6378003Z graph_break []
2025-12-04T10:11:57.6378182Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6378649Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6379090Z Traceback (most recent call last):
2025-12-04T10:11:57.6379529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6379969Z     method(*args, **kwargs)
2025-12-04T10:11:57.6380427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6380868Z     method(*args, **kwargs)
2025-12-04T10:11:57.6381272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6381702Z     with policy():
2025-12-04T10:11:57.6382100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6382544Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6383487Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6384380Z 
2025-12-04T10:11:57.6384513Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6385238Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6385839Z 
2025-12-04T10:11:57.6385996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6386358Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6386656Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6387242Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6387801Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6388071Z graph_break []
2025-12-04T10:11:57.6388321Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6388622Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6388912Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6389462Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6389940Z graph_break []
2025-12-04T10:11:57.6390154Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6390452Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6390739Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6391279Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6391764Z graph_break []
2025-12-04T10:11:57.6392339Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml -
2025-12-04T10:11:57.6392997Z =========================== short test summary info ============================
2025-12-04T10:11:57.6394485Z FAILED [0.6145s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6395856Z 
2025-12-04T10:11:57.6395984Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6396714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6397358Z 
2025-12-04T10:11:57.6397516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6397857Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6398157Z ================== 1 failed, 22 deselected, 2 rerun in 3.27s ===================
2025-12-04T10:11:57.6398422Z Got exit code 1
2025-12-04T10:11:57.6398586Z Retrying single test...
2025-12-04T10:11:57.6398964Z W1204 09:44:04.996000 46753 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6399693Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml
2025-12-04T10:11:57.6400295Z ============================= test session starts ==============================
2025-12-04T10:11:57.6400682Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6401029Z cachedir: .pytest_cache
2025-12-04T10:11:57.6401457Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6401911Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6402122Z configfile: pytest.ini
2025-12-04T10:11:57.6402547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6403136Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6403918Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6404670Z Running 1 items in this shard
2025-12-04T10:11:57.6404804Z 
2025-12-04T10:11:57.6405536Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:06.226747132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6406350Z 
2025-12-04T10:11:57.6406666Z [W1204 09:44:15.367819867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6407050Z 
2025-12-04T10:11:57.6407348Z [W1204 09:44:15.368078722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6407723Z 
2025-12-04T10:11:57.6408018Z [W1204 09:44:15.374006353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6408393Z 
2025-12-04T10:11:57.6408688Z [W1204 09:44:15.374596463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6409061Z 
2025-12-04T10:11:57.6409351Z [W1204 09:44:15.374765436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6409727Z 
2025-12-04T10:11:57.6410017Z [W1204 09:44:15.380166038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6410393Z 
2025-12-04T10:11:57.6410686Z [W1204 09:44:15.380708648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6411056Z 
2025-12-04T10:11:57.6411353Z [W1204 09:44:15.380870440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6411764Z 
2025-12-04T10:11:57.6411856Z ('RERUN', {'yellow': True}) [11.1794s] [100%]
2025-12-04T10:11:57.6412753Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:16.731716380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6413560Z 
2025-12-04T10:11:57.6413855Z [W1204 09:44:16.732239558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6414236Z 
2025-12-04T10:11:57.6414534Z [W1204 09:44:16.732388701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6414903Z 
2025-12-04T10:11:57.6415197Z [W1204 09:44:16.735272500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6415568Z 
2025-12-04T10:11:57.6415862Z [W1204 09:44:16.735823010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6416237Z 
2025-12-04T10:11:57.6416526Z [W1204 09:44:16.735963042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6416899Z 
2025-12-04T10:11:57.6417436Z [W1204 09:44:16.740510320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6417813Z 
2025-12-04T10:11:57.6418235Z [W1204 09:44:16.740966347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6418628Z 
2025-12-04T10:11:57.6418920Z [W1204 09:44:16.741107650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6419347Z 
2025-12-04T10:11:57.6419436Z ('RERUN', {'yellow': True}) [0.5915s] [100%]
2025-12-04T10:11:57.6420318Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:17.321714799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6421123Z 
2025-12-04T10:11:57.6421421Z [W1204 09:44:17.322235648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6421799Z 
2025-12-04T10:11:57.6422096Z [W1204 09:44:17.322375030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6422473Z 
2025-12-04T10:11:57.6422765Z [W1204 09:44:17.325284540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6423139Z 
2025-12-04T10:11:57.6423435Z [W1204 09:44:17.325827559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6423803Z 
2025-12-04T10:11:57.6424098Z [W1204 09:44:17.325966372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6424464Z 
2025-12-04T10:11:57.6424758Z [W1204 09:44:17.330438319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6425129Z 
2025-12-04T10:11:57.6425420Z [W1204 09:44:17.330894746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6425794Z 
2025-12-04T10:11:57.6426088Z [W1204 09:44:17.331033159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6426460Z 
2025-12-04T10:11:57.6426583Z FAILED [0.5922s] [100%]
2025-12-04T10:11:57.6426690Z 
2025-12-04T10:11:57.6426779Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6427252Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6427714Z Traceback (most recent call last):
2025-12-04T10:11:57.6428175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6428618Z     method(*args, **kwargs)
2025-12-04T10:11:57.6429035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6429484Z     method(*args, **kwargs)
2025-12-04T10:11:57.6429895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6430323Z     with policy():
2025-12-04T10:11:57.6430721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6431167Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6432093Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6432982Z 
2025-12-04T10:11:57.6433115Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6433944Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6434553Z 
2025-12-04T10:11:57.6434748Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6435123Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6435428Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6435949Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6436515Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6436785Z graph_break []
2025-12-04T10:11:57.6437001Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6437916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6438759Z   if out == self.unknown_value:
2025-12-04T10:11:57.6439179Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6439632Z Traceback (most recent call last):
2025-12-04T10:11:57.6440121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6440564Z     method(*args, **kwargs)
2025-12-04T10:11:57.6440967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6441422Z     method(*args, **kwargs)
2025-12-04T10:11:57.6441829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6442257Z     with policy():
2025-12-04T10:11:57.6442643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6443127Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6444092Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6444989Z 
2025-12-04T10:11:57.6445119Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6445846Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6446450Z 
2025-12-04T10:11:57.6446610Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6446976Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6447280Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6447797Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6448350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6448620Z graph_break []
2025-12-04T10:11:57.6448832Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6449820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6450653Z   if out == self.unknown_value:
2025-12-04T10:11:57.6450903Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6451238Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6451534Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6452082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6452568Z graph_break []
2025-12-04T10:11:57.6452742Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6453218Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6453667Z Traceback (most recent call last):
2025-12-04T10:11:57.6454115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6454552Z     method(*args, **kwargs)
2025-12-04T10:11:57.6454960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6455398Z     method(*args, **kwargs)
2025-12-04T10:11:57.6455801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6456231Z     with policy():
2025-12-04T10:11:57.6456619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6457060Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6458005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6458909Z 
2025-12-04T10:11:57.6459035Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6459806Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6460403Z 
2025-12-04T10:11:57.6460566Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6460924Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6461228Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6461762Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6462310Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6462571Z graph_break []
2025-12-04T10:11:57.6462787Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6463686Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6464522Z   if out == self.unknown_value:
2025-12-04T10:11:57.6464767Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6465076Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6465491Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6466195Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6466865Z graph_break []
2025-12-04T10:11:57.6467173Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6467601Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6468047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6468717Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6469252Z graph_break []
2025-12-04T10:11:57.6469958Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml -
2025-12-04T10:11:57.6470739Z =========================== short test summary info ============================
2025-12-04T10:11:57.6472334Z FAILED [0.5922s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6473722Z 
2025-12-04T10:11:57.6473989Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6474808Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6475435Z 
2025-12-04T10:11:57.6475623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6476131Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6476528Z ================== 1 failed, 57 deselected, 2 rerun in 12.39s ==================
2025-12-04T10:11:57.6476857Z Got exit code 1
2025-12-04T10:11:57.6477154Z Retrying single test...
2025-12-04T10:11:57.6477663Z W1204 09:44:23.934000 46947 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6478481Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml
2025-12-04T10:11:57.6479159Z ============================= test session starts ==============================
2025-12-04T10:11:57.6479637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6480139Z cachedir: .pytest_cache
2025-12-04T10:11:57.6480711Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6481228Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6481543Z configfile: pytest.ini
2025-12-04T10:11:57.6482118Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6482744Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6483565Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6484441Z Running 1 items in this shard
2025-12-04T10:11:57.6484599Z 
2025-12-04T10:11:57.6485492Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:25.182186073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6486333Z 
2025-12-04T10:11:57.6486737Z [W1204 09:44:34.267724860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6487198Z 
2025-12-04T10:11:57.6487574Z [W1204 09:44:34.267991195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6487978Z 
2025-12-04T10:11:57.6488302Z [W1204 09:44:34.273918746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6488722Z 
2025-12-04T10:11:57.6489099Z [W1204 09:44:34.274515536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6489565Z 
2025-12-04T10:11:57.6489890Z [W1204 09:44:34.274692419 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6490320Z 
2025-12-04T10:11:57.6490642Z [W1204 09:44:34.280217624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6491029Z 
2025-12-04T10:11:57.6491463Z [W1204 09:44:34.280753093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6491866Z 
2025-12-04T10:11:57.6492220Z [W1204 09:44:34.280918636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6492619Z 
2025-12-04T10:11:57.6492732Z ('RERUN', {'yellow': True}) [11.1387s] [100%]
2025-12-04T10:11:57.6493787Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:35.635248313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6494653Z 
2025-12-04T10:11:57.6494977Z [W1204 09:44:35.635773042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6495382Z 
2025-12-04T10:11:57.6495776Z [W1204 09:44:35.635910984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6496246Z 
2025-12-04T10:11:57.6496612Z [W1204 09:44:35.638895245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6497016Z 
2025-12-04T10:11:57.6497337Z [W1204 09:44:35.639461015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6497773Z 
2025-12-04T10:11:57.6498083Z [W1204 09:44:35.639600088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6498611Z 
2025-12-04T10:11:57.6498936Z [W1204 09:44:35.644200817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6499368Z 
2025-12-04T10:11:57.6499690Z [W1204 09:44:35.644679575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6500093Z 
2025-12-04T10:11:57.6500494Z [W1204 09:44:35.644816158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6500908Z 
2025-12-04T10:11:57.6501050Z ('RERUN', {'yellow': True}) [0.5994s] [100%]
2025-12-04T10:11:57.6502078Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:44:36.233581086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6502935Z 
2025-12-04T10:11:57.6503303Z [W1204 09:44:36.234118305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6503752Z 
2025-12-04T10:11:57.6504111Z [W1204 09:44:36.234260498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6504528Z 
2025-12-04T10:11:57.6504883Z [W1204 09:44:36.237235039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6505280Z 
2025-12-04T10:11:57.6505701Z [W1204 09:44:36.237793578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6506103Z 
2025-12-04T10:11:57.6506427Z [W1204 09:44:36.237931541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6506873Z 
2025-12-04T10:11:57.6507191Z [W1204 09:44:36.242523190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6507651Z 
2025-12-04T10:11:57.6507986Z [W1204 09:44:36.242988228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6508433Z 
2025-12-04T10:11:57.6508755Z [W1204 09:44:36.243123640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6509156Z 
2025-12-04T10:11:57.6509272Z FAILED [0.6007s] [100%]
2025-12-04T10:11:57.6509449Z 
2025-12-04T10:11:57.6509581Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6510170Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6510698Z Traceback (most recent call last):
2025-12-04T10:11:57.6511321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6511827Z     method(*args, **kwargs)
2025-12-04T10:11:57.6512315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6512967Z     method(*args, **kwargs)
2025-12-04T10:11:57.6513476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6513954Z     with policy():
2025-12-04T10:11:57.6514522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6515061Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6516120Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6517212Z 
2025-12-04T10:11:57.6517386Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6518224Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6518872Z 
2025-12-04T10:11:57.6519137Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6519623Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6520035Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6520819Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6521507Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6521883Z graph_break []
2025-12-04T10:11:57.6522207Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6523301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6524247Z   if out == self.unknown_value:
2025-12-04T10:11:57.6524820Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6525338Z Traceback (most recent call last):
2025-12-04T10:11:57.6525895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6526498Z     method(*args, **kwargs)
2025-12-04T10:11:57.6527012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6527506Z     method(*args, **kwargs)
2025-12-04T10:11:57.6528057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6528596Z     with policy():
2025-12-04T10:11:57.6529083Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6529651Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6530711Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6531672Z 
2025-12-04T10:11:57.6531818Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6532714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6533413Z 
2025-12-04T10:11:57.6533648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6534064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6534536Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6535147Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6535821Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6536180Z graph_break []
2025-12-04T10:11:57.6536488Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6537509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6538440Z   if out == self.unknown_value:
2025-12-04T10:11:57.6538790Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6539226Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6539636Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6540250Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6540869Z graph_break []
2025-12-04T10:11:57.6541222Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6541800Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6542378Z Traceback (most recent call last):
2025-12-04T10:11:57.6542932Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6543481Z     method(*args, **kwargs)
2025-12-04T10:11:57.6544029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6544519Z     method(*args, **kwargs)
2025-12-04T10:11:57.6545037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6545616Z     with policy():
2025-12-04T10:11:57.6546075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6546626Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6547762Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6548875Z 
2025-12-04T10:11:57.6549121Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6550038Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6550714Z 
2025-12-04T10:11:57.6550921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6551408Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6551785Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6552481Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6553139Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6553483Z graph_break []
2025-12-04T10:11:57.6553886Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6554884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6555764Z   if out == self.unknown_value:
2025-12-04T10:11:57.6556197Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6556587Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6557005Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6557681Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6558367Z graph_break []
2025-12-04T10:11:57.6558780Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6559219Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6559624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6560363Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6560948Z graph_break []
2025-12-04T10:11:57.6561670Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml -
2025-12-04T10:11:57.6562482Z =========================== short test summary info ============================
2025-12-04T10:11:57.6564134Z FAILED [0.6007s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6565580Z 
2025-12-04T10:11:57.6565738Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6566618Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6567247Z 
2025-12-04T10:11:57.6567485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6567902Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6568340Z ================== 1 failed, 57 deselected, 2 rerun in 12.36s ==================
2025-12-04T10:11:57.6568703Z Got exit code 1
2025-12-04T10:11:57.6569332Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6570272Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6570972Z W1204 09:44:42.903000 47141 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6571779Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml
2025-12-04T10:11:57.6572492Z ============================= test session starts ==============================
2025-12-04T10:11:57.6573003Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6573439Z cachedir: .pytest_cache
2025-12-04T10:11:57.6574052Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6574611Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6574872Z configfile: pytest.ini
2025-12-04T10:11:57.6575470Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6576084Z collecting ... collected 58 items / 23 deselected / 35 selected
2025-12-04T10:11:57.6576438Z stepcurrent: skipping 23 already run items.
2025-12-04T10:11:57.6576920Z Running 35 items in this shard
2025-12-04T10:11:57.6577078Z 
2025-12-04T10:11:57.6577647Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8470s] [  2%]
2025-12-04T10:11:57.6578829Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4487s] [  2%]
2025-12-04T10:11:57.6579960Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4446s] [  2%]
2025-12-04T10:11:57.6580515Z 
2025-12-04T10:11:57.6580706Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6581316Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6581867Z Traceback (most recent call last):
2025-12-04T10:11:57.6582430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6583006Z     method(*args, **kwargs)
2025-12-04T10:11:57.6583527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6584074Z     method(*args, **kwargs)
2025-12-04T10:11:57.6584644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6585185Z     with policy():
2025-12-04T10:11:57.6585688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6586276Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6587274Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6588239Z 
2025-12-04T10:11:57.6588400Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6589265Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6589921Z 
2025-12-04T10:11:57.6590129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6590560Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6595631Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6596234Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6596802Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6597158Z graph_break []
2025-12-04T10:11:57.6597562Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6598016Z Traceback (most recent call last):
2025-12-04T10:11:57.6598490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6598944Z     method(*args, **kwargs)
2025-12-04T10:11:57.6599360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6599804Z     method(*args, **kwargs)
2025-12-04T10:11:57.6600305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6600752Z     with policy():
2025-12-04T10:11:57.6601163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6601614Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6602558Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6603441Z 
2025-12-04T10:11:57.6603614Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6604435Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6605041Z 
2025-12-04T10:11:57.6605204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6605631Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6605946Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6606471Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6607024Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6607295Z graph_break []
2025-12-04T10:11:57.6607515Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6607821Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6608115Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6608659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6609134Z graph_break []
2025-12-04T10:11:57.6609311Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6609781Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6610233Z Traceback (most recent call last):
2025-12-04T10:11:57.6610689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6611150Z     method(*args, **kwargs)
2025-12-04T10:11:57.6611571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6612004Z     method(*args, **kwargs)
2025-12-04T10:11:57.6612408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6612840Z     with policy():
2025-12-04T10:11:57.6613234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6613722Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6614653Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6615533Z 
2025-12-04T10:11:57.6615669Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6616396Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6617203Z 
2025-12-04T10:11:57.6617380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6617759Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6618064Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6618574Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6619120Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6619382Z graph_break []
2025-12-04T10:11:57.6619597Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6619893Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6620328Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6620880Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6621403Z graph_break []
2025-12-04T10:11:57.6621612Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6621907Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6622195Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6622732Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6623205Z graph_break []
2025-12-04T10:11:57.6623789Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml -
2025-12-04T10:11:57.6624459Z =========================== short test summary info ============================
2025-12-04T10:11:57.6625929Z FAILED [0.4446s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6627280Z 
2025-12-04T10:11:57.6627409Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6628138Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6628739Z 
2025-12-04T10:11:57.6628899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6629242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6629539Z ================== 1 failed, 23 deselected, 2 rerun in 2.76s ===================
2025-12-04T10:11:57.6629852Z Got exit code 1
2025-12-04T10:11:57.6630025Z Retrying single test...
2025-12-04T10:11:57.6630399Z W1204 09:44:52.633000 47329 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6631118Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml
2025-12-04T10:11:57.6631223Z ============================= test session starts ==============================
2025-12-04T10:11:57.6631437Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6631504Z cachedir: .pytest_cache
2025-12-04T10:11:57.6631814Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6631896Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6631970Z configfile: pytest.ini
2025-12-04T10:11:57.6632288Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6632419Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6632991Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6633065Z Running 1 items in this shard
2025-12-04T10:11:57.6633069Z 
2025-12-04T10:11:57.6633866Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:44:53.685392677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6633902Z 
2025-12-04T10:11:57.6634207Z [W1204 09:45:02.645211950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6634210Z 
2025-12-04T10:11:57.6634508Z [W1204 09:45:02.645474164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6634511Z 
2025-12-04T10:11:57.6634797Z [W1204 09:45:02.651338124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6634800Z 
2025-12-04T10:11:57.6635103Z [W1204 09:45:02.651914024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6635110Z 
2025-12-04T10:11:57.6635398Z [W1204 09:45:02.652082247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6635402Z 
2025-12-04T10:11:57.6635691Z [W1204 09:45:02.657506160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6635696Z 
2025-12-04T10:11:57.6635989Z [W1204 09:45:02.658033379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6635993Z 
2025-12-04T10:11:57.6636278Z [W1204 09:45:02.658188591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6636282Z 
2025-12-04T10:11:57.6636367Z ('RERUN', {'yellow': True}) [10.8188s] [100%]
2025-12-04T10:11:57.6637091Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:03.834596647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6637096Z 
2025-12-04T10:11:57.6637391Z [W1204 09:45:03.835130266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6637512Z 
2025-12-04T10:11:57.6637796Z [W1204 09:45:03.835274478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6637800Z 
2025-12-04T10:11:57.6638091Z [W1204 09:45:03.838197638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6638095Z 
2025-12-04T10:11:57.6638383Z [W1204 09:45:03.838765848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6638387Z 
2025-12-04T10:11:57.6638672Z [W1204 09:45:03.838907320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6638679Z 
2025-12-04T10:11:57.6638976Z [W1204 09:45:03.843463178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6638982Z 
2025-12-04T10:11:57.6639271Z [W1204 09:45:03.843920596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6639274Z 
2025-12-04T10:11:57.6639563Z [W1204 09:45:03.844057148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6639566Z 
2025-12-04T10:11:57.6639646Z ('RERUN', {'yellow': True}) [0.4201s] [100%]
2025-12-04T10:11:57.6640483Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:04.253293728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6640488Z 
2025-12-04T10:11:57.6640824Z [W1204 09:45:04.253854578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6640829Z 
2025-12-04T10:11:57.6641131Z [W1204 09:45:04.254001940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6641134Z 
2025-12-04T10:11:57.6641423Z [W1204 09:45:04.256955051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6641426Z 
2025-12-04T10:11:57.6641721Z [W1204 09:45:04.257516140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6641730Z 
2025-12-04T10:11:57.6642021Z [W1204 09:45:04.257657612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6642024Z 
2025-12-04T10:11:57.6642311Z [W1204 09:45:04.262205281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6642317Z 
2025-12-04T10:11:57.6642607Z [W1204 09:45:04.262668038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6642611Z 
2025-12-04T10:11:57.6642905Z [W1204 09:45:04.262804811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6642908Z 
2025-12-04T10:11:57.6642975Z FAILED [0.4161s] [100%]
2025-12-04T10:11:57.6642979Z 
2025-12-04T10:11:57.6643069Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6643459Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6643595Z Traceback (most recent call last):
2025-12-04T10:11:57.6643952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6644094Z     method(*args, **kwargs)
2025-12-04T10:11:57.6644396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6644460Z     method(*args, **kwargs)
2025-12-04T10:11:57.6644756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6644820Z     with policy():
2025-12-04T10:11:57.6645122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6645191Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6645996Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6646004Z 
2025-12-04T10:11:57.6646144Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6646666Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6646671Z 
2025-12-04T10:11:57.6646848Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6646984Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6647151Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6647513Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6647643Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6647745Z graph_break []
2025-12-04T10:11:57.6647870Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6648576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6648656Z   if out == self.unknown_value:
2025-12-04T10:11:57.6648952Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6649035Z Traceback (most recent call last):
2025-12-04T10:11:57.6649341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6649407Z     method(*args, **kwargs)
2025-12-04T10:11:57.6649702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6649770Z     method(*args, **kwargs)
2025-12-04T10:11:57.6650059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6650124Z     with policy():
2025-12-04T10:11:57.6650432Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6650503Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6651320Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6651324Z 
2025-12-04T10:11:57.6651458Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6652016Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6652020Z 
2025-12-04T10:11:57.6652180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6652312Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6652409Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6652767Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6652895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6652956Z graph_break []
2025-12-04T10:11:57.6653084Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6653779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6653849Z   if out == self.unknown_value:
2025-12-04T10:11:57.6653977Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6654068Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6654195Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6654632Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6654696Z graph_break []
2025-12-04T10:11:57.6654786Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6655115Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6655196Z Traceback (most recent call last):
2025-12-04T10:11:57.6655497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6655561Z     method(*args, **kwargs)
2025-12-04T10:11:57.6655855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6655920Z     method(*args, **kwargs)
2025-12-04T10:11:57.6656214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6656280Z     with policy():
2025-12-04T10:11:57.6656573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6656643Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6657463Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6657467Z 
2025-12-04T10:11:57.6657595Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6658126Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6658132Z 
2025-12-04T10:11:57.6658291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6658422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6658516Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6658899Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6659029Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6659089Z graph_break []
2025-12-04T10:11:57.6659216Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6659902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6659972Z   if out == self.unknown_value:
2025-12-04T10:11:57.6660097Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6660186Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6660313Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6660654Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6660723Z graph_break []
2025-12-04T10:11:57.6660854Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6660943Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6661064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6661477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6661538Z graph_break []
2025-12-04T10:11:57.6662031Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml -
2025-12-04T10:11:57.6662168Z =========================== short test summary info ============================
2025-12-04T10:11:57.6663462Z FAILED [0.4161s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6663466Z 
2025-12-04T10:11:57.6663595Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6664111Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6664122Z 
2025-12-04T10:11:57.6664277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6664381Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6664502Z ================== 1 failed, 57 deselected, 2 rerun in 11.68s ==================
2025-12-04T10:11:57.6664560Z Got exit code 1
2025-12-04T10:11:57.6664625Z Retrying single test...
2025-12-04T10:11:57.6664897Z W1204 09:45:10.881000 47522 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6665286Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml
2025-12-04T10:11:57.6665386Z ============================= test session starts ==============================
2025-12-04T10:11:57.6665597Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6665702Z cachedir: .pytest_cache
2025-12-04T10:11:57.6666015Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6666093Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6666160Z configfile: pytest.ini
2025-12-04T10:11:57.6666478Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6666607Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6667188Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6667260Z Running 1 items in this shard
2025-12-04T10:11:57.6667264Z 
2025-12-04T10:11:57.6667989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:11.939141588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6668001Z 
2025-12-04T10:11:57.6668304Z [W1204 09:45:21.100585283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6668308Z 
2025-12-04T10:11:57.6668600Z [W1204 09:45:21.100842757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6668607Z 
2025-12-04T10:11:57.6668973Z [W1204 09:45:21.106475104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6668978Z 
2025-12-04T10:11:57.6669268Z [W1204 09:45:21.107027143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6669306Z 
2025-12-04T10:11:57.6669599Z [W1204 09:45:21.107192296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6669603Z 
2025-12-04T10:11:57.6669890Z [W1204 09:45:21.112576928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6669893Z 
2025-12-04T10:11:57.6670183Z [W1204 09:45:21.113090207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6670187Z 
2025-12-04T10:11:57.6670475Z [W1204 09:45:21.113254890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6670478Z 
2025-12-04T10:11:57.6670563Z ('RERUN', {'yellow': True}) [11.0205s] [100%]
2025-12-04T10:11:57.6671283Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:22.281394816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6671290Z 
2025-12-04T10:11:57.6671578Z [W1204 09:45:22.281937095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6671585Z 
2025-12-04T10:11:57.6671871Z [W1204 09:45:22.282085178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6671875Z 
2025-12-04T10:11:57.6672164Z [W1204 09:45:22.284985577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6672168Z 
2025-12-04T10:11:57.6672456Z [W1204 09:45:22.285541866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6672496Z 
2025-12-04T10:11:57.6672785Z [W1204 09:45:22.285679179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6672788Z 
2025-12-04T10:11:57.6673076Z [W1204 09:45:22.290204746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6673080Z 
2025-12-04T10:11:57.6673365Z [W1204 09:45:22.290673254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6673368Z 
2025-12-04T10:11:57.6673665Z [W1204 09:45:22.290809187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6673668Z 
2025-12-04T10:11:57.6673747Z ('RERUN', {'yellow': True}) [0.4057s] [100%]
2025-12-04T10:11:57.6674466Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:45:22.686033982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6674472Z 
2025-12-04T10:11:57.6674760Z [W1204 09:45:22.686566471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6674763Z 
2025-12-04T10:11:57.6675055Z [W1204 09:45:22.686711124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6675062Z 
2025-12-04T10:11:57.6675415Z [W1204 09:45:22.689602714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6675419Z 
2025-12-04T10:11:57.6675705Z [W1204 09:45:22.690175834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6675743Z 
2025-12-04T10:11:57.6676037Z [W1204 09:45:22.690324746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6676040Z 
2025-12-04T10:11:57.6676326Z [W1204 09:45:22.694798733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6676330Z 
2025-12-04T10:11:57.6676625Z [W1204 09:45:22.695256360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6676628Z 
2025-12-04T10:11:57.6676917Z [W1204 09:45:22.695392752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6676920Z 
2025-12-04T10:11:57.6676988Z FAILED [0.4025s] [100%]
2025-12-04T10:11:57.6676992Z 
2025-12-04T10:11:57.6677078Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6677375Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6677465Z Traceback (most recent call last):
2025-12-04T10:11:57.6677778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6677848Z     method(*args, **kwargs)
2025-12-04T10:11:57.6678148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6678211Z     method(*args, **kwargs)
2025-12-04T10:11:57.6678514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6678575Z     with policy():
2025-12-04T10:11:57.6678871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6678945Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6679799Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6679803Z 
2025-12-04T10:11:57.6680013Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6680540Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6680544Z 
2025-12-04T10:11:57.6680708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6680835Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6680931Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6681287Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6681417Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6681476Z graph_break []
2025-12-04T10:11:57.6681609Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6682373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6682453Z   if out == self.unknown_value:
2025-12-04T10:11:57.6682747Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6682855Z Traceback (most recent call last):
2025-12-04T10:11:57.6683163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6683228Z     method(*args, **kwargs)
2025-12-04T10:11:57.6683525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6683593Z     method(*args, **kwargs)
2025-12-04T10:11:57.6683883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6683949Z     with policy():
2025-12-04T10:11:57.6684246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6684313Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6685131Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6685137Z 
2025-12-04T10:11:57.6685268Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6685795Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6685799Z 
2025-12-04T10:11:57.6685955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6686086Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6686179Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6686525Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6686696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6686756Z graph_break []
2025-12-04T10:11:57.6686894Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6687593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6687663Z   if out == self.unknown_value:
2025-12-04T10:11:57.6687809Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6687901Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6688030Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6688374Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6688436Z graph_break []
2025-12-04T10:11:57.6688524Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6688817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6688895Z Traceback (most recent call last):
2025-12-04T10:11:57.6689202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6689267Z     method(*args, **kwargs)
2025-12-04T10:11:57.6689632Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6689696Z     method(*args, **kwargs)
2025-12-04T10:11:57.6689990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6690085Z     with policy():
2025-12-04T10:11:57.6690381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6690451Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6691264Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6691268Z 
2025-12-04T10:11:57.6691399Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6691918Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6691924Z 
2025-12-04T10:11:57.6692079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6692208Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6692300Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6692649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6692773Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6692830Z graph_break []
2025-12-04T10:11:57.6692959Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6693653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6693774Z   if out == self.unknown_value:
2025-12-04T10:11:57.6693898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6693988Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6694115Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6694457Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6694516Z graph_break []
2025-12-04T10:11:57.6694646Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6694736Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6694861Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6695200Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6695262Z graph_break []
2025-12-04T10:11:57.6695750Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml -
2025-12-04T10:11:57.6695854Z =========================== short test summary info ============================
2025-12-04T10:11:57.6697209Z FAILED [0.4025s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6697246Z 
2025-12-04T10:11:57.6697373Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6697910Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6697913Z 
2025-12-04T10:11:57.6698069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6698174Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6698293Z ================== 1 failed, 57 deselected, 2 rerun in 11.85s ==================
2025-12-04T10:11:57.6698355Z Got exit code 1
2025-12-04T10:11:57.6698830Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6699075Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6699342Z W1204 09:45:29.332000 47715 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6699733Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml
2025-12-04T10:11:57.6699828Z ============================= test session starts ==============================
2025-12-04T10:11:57.6700043Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6700111Z cachedir: .pytest_cache
2025-12-04T10:11:57.6700420Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6700501Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6700566Z configfile: pytest.ini
2025-12-04T10:11:57.6700886Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6701056Z collecting ... collected 58 items / 24 deselected / 34 selected
2025-12-04T10:11:57.6701149Z stepcurrent: skipping 24 already run items.
2025-12-04T10:11:57.6701223Z Running 34 items in this shard
2025-12-04T10:11:57.6701230Z 
2025-12-04T10:11:57.6701726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9248s] [  2%]
2025-12-04T10:11:57.6702211Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5342s] [  2%]
2025-12-04T10:11:57.6702656Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5290s] [  2%]
2025-12-04T10:11:57.6702662Z 
2025-12-04T10:11:57.6702745Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6703049Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6703126Z Traceback (most recent call last):
2025-12-04T10:11:57.6703435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6703506Z     method(*args, **kwargs)
2025-12-04T10:11:57.6703864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6703933Z     method(*args, **kwargs)
2025-12-04T10:11:57.6704223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6704282Z     with policy():
2025-12-04T10:11:57.6704637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6704704Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6705506Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6705513Z 
2025-12-04T10:11:57.6705643Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6706165Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6706168Z 
2025-12-04T10:11:57.6706331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6706463Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6706562Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6707111Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6707239Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6707303Z graph_break []
2025-12-04T10:11:57.6707594Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6707673Z Traceback (most recent call last):
2025-12-04T10:11:57.6707974Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6708041Z     method(*args, **kwargs)
2025-12-04T10:11:57.6708377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6708440Z     method(*args, **kwargs)
2025-12-04T10:11:57.6708728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6708792Z     with policy():
2025-12-04T10:11:57.6709090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6709160Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6709972Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6709979Z 
2025-12-04T10:11:57.6710102Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6710626Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6710630Z 
2025-12-04T10:11:57.6710788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6710919Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6711012Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6711628Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6711757Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6711851Z graph_break []
2025-12-04T10:11:57.6711979Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6712067Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6712189Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6712730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6712787Z graph_break []
2025-12-04T10:11:57.6712874Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6713161Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6713234Z Traceback (most recent call last):
2025-12-04T10:11:57.6713542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6713610Z     method(*args, **kwargs)
2025-12-04T10:11:57.6713904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6713972Z     method(*args, **kwargs)
2025-12-04T10:11:57.6714259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6714322Z     with policy():
2025-12-04T10:11:57.6714617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6714683Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6715496Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6715540Z 
2025-12-04T10:11:57.6715664Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6716187Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6716190Z 
2025-12-04T10:11:57.6716348Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6716494Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6716585Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6717264Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6717400Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6717458Z graph_break []
2025-12-04T10:11:57.6717583Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6717675Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6717794Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6718449Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6718511Z graph_break []
2025-12-04T10:11:57.6718634Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6718726Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6718893Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6719427Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6719484Z graph_break []
2025-12-04T10:11:57.6720018Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml -
2025-12-04T10:11:57.6720127Z =========================== short test summary info ============================
2025-12-04T10:11:57.6721410Z FAILED [0.5290s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6721418Z 
2025-12-04T10:11:57.6721550Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6722065Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6722069Z 
2025-12-04T10:11:57.6722229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6722343Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6722463Z ================== 1 failed, 24 deselected, 2 rerun in 3.01s ===================
2025-12-04T10:11:57.6722528Z Got exit code 1
2025-12-04T10:11:57.6722652Z Retrying single test...
2025-12-04T10:11:57.6722914Z W1204 09:45:38.987000 47904 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6723300Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml
2025-12-04T10:11:57.6723398Z ============================= test session starts ==============================
2025-12-04T10:11:57.6723608Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6723672Z cachedir: .pytest_cache
2025-12-04T10:11:57.6723982Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6724067Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6724132Z configfile: pytest.ini
2025-12-04T10:11:57.6724453Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6724585Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6725152Z stepcurrent: skipping 24 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6725239Z Running 1 items in this shard
2025-12-04T10:11:57.6725243Z 
2025-12-04T10:11:57.6726046Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:40.571251588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6726051Z 
2025-12-04T10:11:57.6726354Z [W1204 09:45:49.638281803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6726402Z 
2025-12-04T10:11:57.6726693Z [W1204 09:45:49.638538838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6726696Z 
2025-12-04T10:11:57.6726989Z [W1204 09:45:49.644813355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6726993Z 
2025-12-04T10:11:57.6727281Z [W1204 09:45:49.645408526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6727284Z 
2025-12-04T10:11:57.6727576Z [W1204 09:45:49.645581328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6727580Z 
2025-12-04T10:11:57.6727865Z [W1204 09:45:49.651069323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6727871Z 
2025-12-04T10:11:57.6728157Z [W1204 09:45:49.651617882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6728164Z 
2025-12-04T10:11:57.6728459Z [W1204 09:45:49.651788215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6728462Z 
2025-12-04T10:11:57.6728542Z ('RERUN', {'yellow': True}) [11.0106s] [100%]
2025-12-04T10:11:57.6729269Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:50.454721756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6729272Z 
2025-12-04T10:11:57.6729560Z [W1204 09:45:50.455227735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6729603Z 
2025-12-04T10:11:57.6729895Z [W1204 09:45:50.455373717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6729902Z 
2025-12-04T10:11:57.6730188Z [W1204 09:45:50.458241117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6730192Z 
2025-12-04T10:11:57.6730479Z [W1204 09:45:50.458689504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6730483Z 
2025-12-04T10:11:57.6730774Z [W1204 09:45:50.458828437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6730777Z 
2025-12-04T10:11:57.6731063Z [W1204 09:45:50.463284513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6731075Z 
2025-12-04T10:11:57.6731364Z [W1204 09:45:50.463735671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6731367Z 
2025-12-04T10:11:57.6731656Z [W1204 09:45:50.463871533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6731659Z 
2025-12-04T10:11:57.6731741Z ('RERUN', {'yellow': True}) [0.4941s] [100%]
2025-12-04T10:11:57.6732546Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:51.945541782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6732551Z 
2025-12-04T10:11:57.6732856Z [W1204 09:45:51.946052661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6732894Z 
2025-12-04T10:11:57.6733186Z [W1204 09:45:51.946191043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6733189Z 
2025-12-04T10:11:57.6733479Z [W1204 09:45:51.949049642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6733482Z 
2025-12-04T10:11:57.6733766Z [W1204 09:45:51.949488750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6733769Z 
2025-12-04T10:11:57.6734062Z [W1204 09:45:51.949629272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6734066Z 
2025-12-04T10:11:57.6734353Z [W1204 09:45:51.954107988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6734358Z 
2025-12-04T10:11:57.6734645Z [W1204 09:45:51.954565266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6734651Z 
2025-12-04T10:11:57.6734937Z [W1204 09:45:51.954704299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6734940Z 
2025-12-04T10:11:57.6735006Z FAILED [0.4933s] [100%]
2025-12-04T10:11:57.6735009Z 
2025-12-04T10:11:57.6735094Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6735399Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6735477Z Traceback (most recent call last):
2025-12-04T10:11:57.6735790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6735855Z     method(*args, **kwargs)
2025-12-04T10:11:57.6736153Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6736254Z     method(*args, **kwargs)
2025-12-04T10:11:57.6736547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6736611Z     with policy():
2025-12-04T10:11:57.6736907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6736976Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6737786Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6737790Z 
2025-12-04T10:11:57.6737918Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6738446Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6738450Z 
2025-12-04T10:11:57.6738608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6738739Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6738832Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6739443Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6739576Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6739668Z graph_break []
2025-12-04T10:11:57.6739803Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6740493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6740563Z   if out == self.unknown_value:
2025-12-04T10:11:57.6740868Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6740943Z Traceback (most recent call last):
2025-12-04T10:11:57.6741249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6741312Z     method(*args, **kwargs)
2025-12-04T10:11:57.6741602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6741671Z     method(*args, **kwargs)
2025-12-04T10:11:57.6741961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6742022Z     with policy():
2025-12-04T10:11:57.6742319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6742385Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6743199Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6743203Z 
2025-12-04T10:11:57.6743328Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6743854Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6743896Z 
2025-12-04T10:11:57.6744054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6744179Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6744275Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6744821Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6744951Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6745010Z graph_break []
2025-12-04T10:11:57.6745134Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6745829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6745898Z   if out == self.unknown_value:
2025-12-04T10:11:57.6746022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6746112Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6746237Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6746845Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6746905Z graph_break []
2025-12-04T10:11:57.6747099Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6747396Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6747469Z Traceback (most recent call last):
2025-12-04T10:11:57.6747770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6747836Z     method(*args, **kwargs)
2025-12-04T10:11:57.6748127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6748195Z     method(*args, **kwargs)
2025-12-04T10:11:57.6748508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6748570Z     with policy():
2025-12-04T10:11:57.6748871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6748939Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6749751Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6749755Z 
2025-12-04T10:11:57.6749877Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6750399Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6750403Z 
2025-12-04T10:11:57.6750559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6750684Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6750817Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6751360Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6751487Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6751545Z graph_break []
2025-12-04T10:11:57.6751669Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6752361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6752428Z   if out == self.unknown_value:
2025-12-04T10:11:57.6752555Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6752644Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6752766Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6753309Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6753367Z graph_break []
2025-12-04T10:11:57.6753489Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6753646Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6753767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6754308Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6754401Z graph_break []
2025-12-04T10:11:57.6754885Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml -
2025-12-04T10:11:57.6754989Z =========================== short test summary info ============================
2025-12-04T10:11:57.6756279Z FAILED [0.4933s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6756285Z 
2025-12-04T10:11:57.6756410Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6756933Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6756936Z 
2025-12-04T10:11:57.6757097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6757205Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6757321Z ================== 1 failed, 57 deselected, 2 rerun in 12.02s ==================
2025-12-04T10:11:57.6757385Z Got exit code 1
2025-12-04T10:11:57.6757452Z Retrying single test...
2025-12-04T10:11:57.6757718Z W1204 09:45:57.580000 48098 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6758103Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml
2025-12-04T10:11:57.6758257Z ============================= test session starts ==============================
2025-12-04T10:11:57.6758468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6758532Z cachedir: .pytest_cache
2025-12-04T10:11:57.6758852Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6758930Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6758994Z configfile: pytest.ini
2025-12-04T10:11:57.6759317Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6759445Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6760046Z stepcurrent: skipping 24 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6760123Z Running 1 items in this shard
2025-12-04T10:11:57.6760126Z 
2025-12-04T10:11:57.6760850Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:45:59.177050356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6760854Z 
2025-12-04T10:11:57.6761249Z [W1204 09:46:08.077094451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6761253Z 
2025-12-04T10:11:57.6761548Z [W1204 09:46:08.077350555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6761585Z 
2025-12-04T10:11:57.6761877Z [W1204 09:46:08.083408479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6761881Z 
2025-12-04T10:11:57.6762166Z [W1204 09:46:08.084024229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6762170Z 
2025-12-04T10:11:57.6762461Z [W1204 09:46:08.084201712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6762464Z 
2025-12-04T10:11:57.6762754Z [W1204 09:46:08.089699206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6762757Z 
2025-12-04T10:11:57.6763050Z [W1204 09:46:08.090265036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6763053Z 
2025-12-04T10:11:57.6763341Z [W1204 09:46:08.090441719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6763346Z 
2025-12-04T10:11:57.6763426Z ('RERUN', {'yellow': True}) [10.8494s] [100%]
2025-12-04T10:11:57.6764148Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:46:08.891393948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6764152Z 
2025-12-04T10:11:57.6764443Z [W1204 09:46:08.891917327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6764446Z 
2025-12-04T10:11:57.6764739Z [W1204 09:46:08.892057639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6764743Z 
2025-12-04T10:11:57.6765030Z [W1204 09:46:08.895029540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6765068Z 
2025-12-04T10:11:57.6765371Z [W1204 09:46:08.895487578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6765376Z 
2025-12-04T10:11:57.6765668Z [W1204 09:46:08.895626890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6765671Z 
2025-12-04T10:11:57.6765963Z [W1204 09:46:08.900307750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6765967Z 
2025-12-04T10:11:57.6766253Z [W1204 09:46:08.900789989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6766257Z 
2025-12-04T10:11:57.6766543Z [W1204 09:46:08.900926601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6766552Z 
2025-12-04T10:11:57.6766629Z ('RERUN', {'yellow': True}) [0.4990s] [100%]
2025-12-04T10:11:57.6767353Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:46:09.386663421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6767356Z 
2025-12-04T10:11:57.6767713Z [W1204 09:46:09.387173520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6767716Z 
2025-12-04T10:11:57.6768014Z [W1204 09:46:09.387312682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6768017Z 
2025-12-04T10:11:57.6768343Z [W1204 09:46:09.390234362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6768348Z 
2025-12-04T10:11:57.6768634Z [W1204 09:46:09.390690769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6768637Z 
2025-12-04T10:11:57.6768929Z [W1204 09:46:09.390829362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6768932Z 
2025-12-04T10:11:57.6769218Z [W1204 09:46:09.395373209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6769224Z 
2025-12-04T10:11:57.6769515Z [W1204 09:46:09.395824187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6769518Z 
2025-12-04T10:11:57.6769805Z [W1204 09:46:09.395959999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6769810Z 
2025-12-04T10:11:57.6769870Z FAILED [0.4940s] [100%]
2025-12-04T10:11:57.6769873Z 
2025-12-04T10:11:57.6769959Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6770250Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6770327Z Traceback (most recent call last):
2025-12-04T10:11:57.6770643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6770714Z     method(*args, **kwargs)
2025-12-04T10:11:57.6771013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6771075Z     method(*args, **kwargs)
2025-12-04T10:11:57.6771367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6771469Z     with policy():
2025-12-04T10:11:57.6771764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6771834Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6772628Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6772635Z 
2025-12-04T10:11:57.6772766Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6773287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6773293Z 
2025-12-04T10:11:57.6773451Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6773586Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6773679Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6774226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6774746Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6774810Z graph_break []
2025-12-04T10:11:57.6774938Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6775626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6775741Z   if out == self.unknown_value:
2025-12-04T10:11:57.6776029Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6776104Z Traceback (most recent call last):
2025-12-04T10:11:57.6776404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6776468Z     method(*args, **kwargs)
2025-12-04T10:11:57.6776764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6776830Z     method(*args, **kwargs)
2025-12-04T10:11:57.6777121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6777185Z     with policy():
2025-12-04T10:11:57.6777481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6777546Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6778365Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6778371Z 
2025-12-04T10:11:57.6778498Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6779023Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6779028Z 
2025-12-04T10:11:57.6779181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6779346Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6779445Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6779991Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6780124Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6780184Z graph_break []
2025-12-04T10:11:57.6780309Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6781010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6781082Z   if out == self.unknown_value:
2025-12-04T10:11:57.6781211Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6781307Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6781431Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6781973Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6782032Z graph_break []
2025-12-04T10:11:57.6782188Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6782482Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6782589Z Traceback (most recent call last):
2025-12-04T10:11:57.6782897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6782960Z     method(*args, **kwargs)
2025-12-04T10:11:57.6783255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6783322Z     method(*args, **kwargs)
2025-12-04T10:11:57.6783611Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6783674Z     with policy():
2025-12-04T10:11:57.6783977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6784047Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6784863Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6784871Z 
2025-12-04T10:11:57.6784997Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6785517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6785520Z 
2025-12-04T10:11:57.6785680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6785807Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6785901Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6786441Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6786631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6786689Z graph_break []
2025-12-04T10:11:57.6786813Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6787506Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6787578Z   if out == self.unknown_value:
2025-12-04T10:11:57.6787702Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6787791Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6787919Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6788464Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6788525Z graph_break []
2025-12-04T10:11:57.6788650Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6788736Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6788854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6789462Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6789521Z graph_break []
2025-12-04T10:11:57.6790015Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml -
2025-12-04T10:11:57.6790152Z =========================== short test summary info ============================
2025-12-04T10:11:57.6791437Z FAILED [0.4940s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6791444Z 
2025-12-04T10:11:57.6791569Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6792085Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6792091Z 
2025-12-04T10:11:57.6792253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6792358Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6792477Z ================== 1 failed, 57 deselected, 2 rerun in 11.87s ==================
2025-12-04T10:11:57.6792535Z Got exit code 1
2025-12-04T10:11:57.6793005Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6793267Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6793532Z W1204 09:46:15.975000 48292 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6793933Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml
2025-12-04T10:11:57.6794068Z ============================= test session starts ==============================
2025-12-04T10:11:57.6794274Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6794347Z cachedir: .pytest_cache
2025-12-04T10:11:57.6794652Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6794729Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6794798Z configfile: pytest.ini
2025-12-04T10:11:57.6795112Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6795243Z collecting ... collected 58 items / 25 deselected / 33 selected
2025-12-04T10:11:57.6795330Z stepcurrent: skipping 25 already run items.
2025-12-04T10:11:57.6795402Z Running 33 items in this shard
2025-12-04T10:11:57.6795406Z 
2025-12-04T10:11:57.6795907Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8940s] [  3%]
2025-12-04T10:11:57.6796397Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5010s] [  3%]
2025-12-04T10:11:57.6796913Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4898s] [  3%]
2025-12-04T10:11:57.6796917Z 
2025-12-04T10:11:57.6797001Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6797296Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6797410Z Traceback (most recent call last):
2025-12-04T10:11:57.6797715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6797783Z     method(*args, **kwargs)
2025-12-04T10:11:57.6798077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6798139Z     method(*args, **kwargs)
2025-12-04T10:11:57.6798437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6798499Z     with policy():
2025-12-04T10:11:57.6798794Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6798863Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6799671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6799677Z 
2025-12-04T10:11:57.6799807Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6800394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6800398Z 
2025-12-04T10:11:57.6800563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6800692Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6800785Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6801140Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6801308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6801372Z graph_break []
2025-12-04T10:11:57.6801658Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6801731Z Traceback (most recent call last):
2025-12-04T10:11:57.6802031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6802097Z     method(*args, **kwargs)
2025-12-04T10:11:57.6802391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6802457Z     method(*args, **kwargs)
2025-12-04T10:11:57.6802751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6802815Z     with policy():
2025-12-04T10:11:57.6803107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6803172Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6803988Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6804059Z 
2025-12-04T10:11:57.6804185Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6804707Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6804746Z 
2025-12-04T10:11:57.6804902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6805028Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6805124Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6805471Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6805600Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6805659Z graph_break []
2025-12-04T10:11:57.6805784Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6805877Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6805998Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6806345Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6806405Z graph_break []
2025-12-04T10:11:57.6806486Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6806779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6806852Z Traceback (most recent call last):
2025-12-04T10:11:57.6807150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6807219Z     method(*args, **kwargs)
2025-12-04T10:11:57.6807513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6807576Z     method(*args, **kwargs)
2025-12-04T10:11:57.6807868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6807966Z     with policy():
2025-12-04T10:11:57.6808274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6808339Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6809162Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6809169Z 
2025-12-04T10:11:57.6809295Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6809812Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6809819Z 
2025-12-04T10:11:57.6809978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6810101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6810194Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6810534Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6810657Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6810787Z graph_break []
2025-12-04T10:11:57.6810911Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6811003Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6811122Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6811523Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6811585Z graph_break []
2025-12-04T10:11:57.6811707Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6811794Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6811920Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6812256Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6812321Z graph_break []
2025-12-04T10:11:57.6812808Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml -
2025-12-04T10:11:57.6812909Z =========================== short test summary info ============================
2025-12-04T10:11:57.6814206Z FAILED [0.4898s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6814210Z 
2025-12-04T10:11:57.6814336Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6814860Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6814864Z 
2025-12-04T10:11:57.6815021Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6815168Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6815282Z ================== 1 failed, 25 deselected, 2 rerun in 2.91s ===================
2025-12-04T10:11:57.6815341Z Got exit code 1
2025-12-04T10:11:57.6815420Z Retrying single test...
2025-12-04T10:11:57.6815688Z W1204 09:46:25.698000 48481 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6816073Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml
2025-12-04T10:11:57.6816175Z ============================= test session starts ==============================
2025-12-04T10:11:57.6816385Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6816456Z cachedir: .pytest_cache
2025-12-04T10:11:57.6816766Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6816843Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6816914Z configfile: pytest.ini
2025-12-04T10:11:57.6817341Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6817479Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6818129Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6818203Z Running 1 items in this shard
2025-12-04T10:11:57.6818206Z 
2025-12-04T10:11:57.6818940Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:26.781109894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6818989Z 
2025-12-04T10:11:57.6819293Z [W1204 09:46:35.738526287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6819297Z 
2025-12-04T10:11:57.6819593Z [W1204 09:46:35.738779521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6819597Z 
2025-12-04T10:11:57.6819887Z [W1204 09:46:35.745010618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6819891Z 
2025-12-04T10:11:57.6820183Z [W1204 09:46:35.745607378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6820188Z 
2025-12-04T10:11:57.6820476Z [W1204 09:46:35.745778641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6820481Z 
2025-12-04T10:11:57.6820774Z [W1204 09:46:35.751146243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6820777Z 
2025-12-04T10:11:57.6821063Z [W1204 09:46:35.751684102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6821067Z 
2025-12-04T10:11:57.6821355Z [W1204 09:46:35.751847365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6821364Z 
2025-12-04T10:11:57.6821447Z ('RERUN', {'yellow': True}) [10.8398s] [100%]
2025-12-04T10:11:57.6822172Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:37.951592257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6822212Z 
2025-12-04T10:11:57.6822508Z [W1204 09:46:37.952102836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6822511Z 
2025-12-04T10:11:57.6822803Z [W1204 09:46:37.952248798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6822806Z 
2025-12-04T10:11:57.6823101Z [W1204 09:46:37.955093097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6823104Z 
2025-12-04T10:11:57.6823393Z [W1204 09:46:37.955636037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6823396Z 
2025-12-04T10:11:57.6823687Z [W1204 09:46:37.955774469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6823693Z 
2025-12-04T10:11:57.6823989Z [W1204 09:46:37.960146324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6823993Z 
2025-12-04T10:11:57.6824287Z [W1204 09:46:37.960607612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6824290Z 
2025-12-04T10:11:57.6824582Z [W1204 09:46:37.960745225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6824660Z 
2025-12-04T10:11:57.6824740Z ('RERUN', {'yellow': True}) [0.4432s] [100%]
2025-12-04T10:11:57.6825465Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:37.393116511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6825504Z 
2025-12-04T10:11:57.6825793Z [W1204 09:46:37.393626150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6825796Z 
2025-12-04T10:11:57.6826089Z [W1204 09:46:37.393764312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6826093Z 
2025-12-04T10:11:57.6826381Z [W1204 09:46:37.396576890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6826384Z 
2025-12-04T10:11:57.6826677Z [W1204 09:46:37.397114290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6826680Z 
2025-12-04T10:11:57.6826966Z [W1204 09:46:37.397252212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6826972Z 
2025-12-04T10:11:57.6827262Z [W1204 09:46:37.401614267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6827266Z 
2025-12-04T10:11:57.6827565Z [W1204 09:46:37.402068125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6827568Z 
2025-12-04T10:11:57.6827858Z [W1204 09:46:37.402203177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6827867Z 
2025-12-04T10:11:57.6827930Z FAILED [0.4408s] [100%]
2025-12-04T10:11:57.6827933Z 
2025-12-04T10:11:57.6828021Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6828324Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6828437Z Traceback (most recent call last):
2025-12-04T10:11:57.6828743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6828813Z     method(*args, **kwargs)
2025-12-04T10:11:57.6829106Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6829172Z     method(*args, **kwargs)
2025-12-04T10:11:57.6829463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6829524Z     with policy():
2025-12-04T10:11:57.6829826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6829893Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6830703Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6830711Z 
2025-12-04T10:11:57.6830842Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6831361Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6831368Z 
2025-12-04T10:11:57.6831594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6831726Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6831822Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6832200Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6832329Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6832392Z graph_break []
2025-12-04T10:11:57.6832516Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6833211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6833287Z   if out == self.unknown_value:
2025-12-04T10:11:57.6833579Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6833658Z Traceback (most recent call last):
2025-12-04T10:11:57.6833960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6834029Z     method(*args, **kwargs)
2025-12-04T10:11:57.6834320Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6834381Z     method(*args, **kwargs)
2025-12-04T10:11:57.6834677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6834736Z     with policy():
2025-12-04T10:11:57.6835030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6835100Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6835927Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6835970Z 
2025-12-04T10:11:57.6836102Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6836621Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6836625Z 
2025-12-04T10:11:57.6836785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6836912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6837008Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6837358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6837488Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6837546Z graph_break []
2025-12-04T10:11:57.6837676Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6838372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6838452Z   if out == self.unknown_value:
2025-12-04T10:11:57.6838576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6838752Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6838879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6839226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6839323Z graph_break []
2025-12-04T10:11:57.6839407Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6839696Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6839778Z Traceback (most recent call last):
2025-12-04T10:11:57.6840135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6840201Z     method(*args, **kwargs)
2025-12-04T10:11:57.6840498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6840560Z     method(*args, **kwargs)
2025-12-04T10:11:57.6840854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6840915Z     with policy():
2025-12-04T10:11:57.6841210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6841284Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6842097Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6842101Z 
2025-12-04T10:11:57.6842228Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6842749Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6842753Z 
2025-12-04T10:11:57.6842918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6843085Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6843175Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6843519Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6843642Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6843702Z graph_break []
2025-12-04T10:11:57.6843828Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6844518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6844592Z   if out == self.unknown_value:
2025-12-04T10:11:57.6844718Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6844810Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6844934Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6845274Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6845336Z graph_break []
2025-12-04T10:11:57.6845457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6845611Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6845736Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6846074Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6846166Z graph_break []
2025-12-04T10:11:57.6846655Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml -
2025-12-04T10:11:57.6846756Z =========================== short test summary info ============================
2025-12-04T10:11:57.6848054Z FAILED [0.4408s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6848059Z 
2025-12-04T10:11:57.6848186Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6848721Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6848725Z 
2025-12-04T10:11:57.6848882Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6848988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6849104Z ================== 1 failed, 57 deselected, 2 rerun in 11.75s ==================
2025-12-04T10:11:57.6849163Z Got exit code 1
2025-12-04T10:11:57.6849230Z Retrying single test...
2025-12-04T10:11:57.6849499Z W1204 09:46:44.099000 48674 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6849883Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml
2025-12-04T10:11:57.6850021Z ============================= test session starts ==============================
2025-12-04T10:11:57.6850229Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6850298Z cachedir: .pytest_cache
2025-12-04T10:11:57.6850602Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6850678Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6850750Z configfile: pytest.ini
2025-12-04T10:11:57.6851071Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6851200Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6851774Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6851847Z Running 1 items in this shard
2025-12-04T10:11:57.6851851Z 
2025-12-04T10:11:57.6852585Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:45.197260818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6852589Z 
2025-12-04T10:11:57.6852886Z [W1204 09:46:54.444797202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6852955Z 
2025-12-04T10:11:57.6853252Z [W1204 09:46:54.445047086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6853256Z 
2025-12-04T10:11:57.6853543Z [W1204 09:46:54.451006968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6853584Z 
2025-12-04T10:11:57.6853872Z [W1204 09:46:54.451608368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6853875Z 
2025-12-04T10:11:57.6854161Z [W1204 09:46:54.451776261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6854164Z 
2025-12-04T10:11:57.6854451Z [W1204 09:46:54.457431578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6854457Z 
2025-12-04T10:11:57.6854747Z [W1204 09:46:54.457982597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6854750Z 
2025-12-04T10:11:57.6855036Z [W1204 09:46:54.458143790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6855042Z 
2025-12-04T10:11:57.6855140Z ('RERUN', {'yellow': True}) [11.1549s] [100%]
2025-12-04T10:11:57.6855866Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:55.670629160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6855870Z 
2025-12-04T10:11:57.6856160Z [W1204 09:46:55.671172150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6856164Z 
2025-12-04T10:11:57.6856454Z [W1204 09:46:55.671318762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6856457Z 
2025-12-04T10:11:57.6856751Z [W1204 09:46:55.674330774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6856791Z 
2025-12-04T10:11:57.6857079Z [W1204 09:46:55.674908434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6857082Z 
2025-12-04T10:11:57.6857369Z [W1204 09:46:55.675049736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6857376Z 
2025-12-04T10:11:57.6857662Z [W1204 09:46:55.679723737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6857665Z 
2025-12-04T10:11:57.6857955Z [W1204 09:46:55.680214216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6857958Z 
2025-12-04T10:11:57.6858251Z [W1204 09:46:55.680371848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6858257Z 
2025-12-04T10:11:57.6858336Z ('RERUN', {'yellow': True}) [0.4530s] [100%]
2025-12-04T10:11:57.6859063Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:46:56.121249793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6859067Z 
2025-12-04T10:11:57.6859355Z [W1204 09:46:56.121789622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6859358Z 
2025-12-04T10:11:57.6859797Z [W1204 09:46:56.121933844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6859800Z 
2025-12-04T10:11:57.6860088Z [W1204 09:46:56.124920655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6860124Z 
2025-12-04T10:11:57.6860419Z [W1204 09:46:56.125483765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6860422Z 
2025-12-04T10:11:57.6860708Z [W1204 09:46:56.125621698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6860711Z 
2025-12-04T10:11:57.6860994Z [W1204 09:46:56.130344708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6861003Z 
2025-12-04T10:11:57.6861291Z [W1204 09:46:56.130819956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6861294Z 
2025-12-04T10:11:57.6861579Z [W1204 09:46:56.130957259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6861585Z 
2025-12-04T10:11:57.6861650Z FAILED [0.4479s] [100%]
2025-12-04T10:11:57.6861654Z 
2025-12-04T10:11:57.6861734Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6862030Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6862105Z Traceback (most recent call last):
2025-12-04T10:11:57.6862410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6862482Z     method(*args, **kwargs)
2025-12-04T10:11:57.6862775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6862839Z     method(*args, **kwargs)
2025-12-04T10:11:57.6863138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6863234Z     with policy():
2025-12-04T10:11:57.6863531Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6863596Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6864400Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6864409Z 
2025-12-04T10:11:57.6864539Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6865056Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6865061Z 
2025-12-04T10:11:57.6865224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6865362Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6865461Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6865814Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6865941Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6866001Z graph_break []
2025-12-04T10:11:57.6866209Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6866903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6867014Z   if out == self.unknown_value:
2025-12-04T10:11:57.6867302Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6867382Z Traceback (most recent call last):
2025-12-04T10:11:57.6867680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6867744Z     method(*args, **kwargs)
2025-12-04T10:11:57.6868038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6868101Z     method(*args, **kwargs)
2025-12-04T10:11:57.6868397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6868456Z     with policy():
2025-12-04T10:11:57.6868756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6868832Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6869647Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6869651Z 
2025-12-04T10:11:57.6869791Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6870315Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6870319Z 
2025-12-04T10:11:57.6870475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6870604Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6870737Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6871089Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6871218Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6871277Z graph_break []
2025-12-04T10:11:57.6871403Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6872096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6872175Z   if out == self.unknown_value:
2025-12-04T10:11:57.6872298Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6872392Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6872522Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6872864Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6872924Z graph_break []
2025-12-04T10:11:57.6873011Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6873300Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.6873445Z Traceback (most recent call last):
2025-12-04T10:11:57.6873745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6873807Z     method(*args, **kwargs)
2025-12-04T10:11:57.6874111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6874209Z     method(*args, **kwargs)
2025-12-04T10:11:57.6874497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6874561Z     with policy():
2025-12-04T10:11:57.6874855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6874924Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6875742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6875746Z 
2025-12-04T10:11:57.6875872Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6876394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6876397Z 
2025-12-04T10:11:57.6876553Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6876684Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6876775Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6877124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6877246Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6877304Z graph_break []
2025-12-04T10:11:57.6877430Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6878154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6878223Z   if out == self.unknown_value:
2025-12-04T10:11:57.6878348Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6878436Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6878559Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6878903Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6878961Z graph_break []
2025-12-04T10:11:57.6879088Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6879189Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6879318Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6879661Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6879719Z graph_break []
2025-12-04T10:11:57.6880253Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml -
2025-12-04T10:11:57.6880354Z =========================== short test summary info ============================
2025-12-04T10:11:57.6881719Z FAILED [0.4479s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6881758Z 
2025-12-04T10:11:57.6881883Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6882405Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6882408Z 
2025-12-04T10:11:57.6882564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6882670Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6882789Z ================== 1 failed, 57 deselected, 2 rerun in 12.08s ==================
2025-12-04T10:11:57.6882849Z Got exit code 1
2025-12-04T10:11:57.6883326Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.6883574Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6883834Z W1204 09:47:02.758000 48867 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6884222Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml
2025-12-04T10:11:57.6884318Z ============================= test session starts ==============================
2025-12-04T10:11:57.6884523Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6884593Z cachedir: .pytest_cache
2025-12-04T10:11:57.6884899Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6885019Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6885087Z configfile: pytest.ini
2025-12-04T10:11:57.6885399Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6885530Z collecting ... collected 58 items / 26 deselected / 32 selected
2025-12-04T10:11:57.6885618Z stepcurrent: skipping 26 already run items.
2025-12-04T10:11:57.6885688Z Running 32 items in this shard
2025-12-04T10:11:57.6885697Z 
2025-12-04T10:11:57.6886193Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8462s] [  3%]
2025-12-04T10:11:57.6886674Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4490s] [  3%]
2025-12-04T10:11:57.6887121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4405s] [  3%]
2025-12-04T10:11:57.6887125Z 
2025-12-04T10:11:57.6887206Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6887497Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6887571Z Traceback (most recent call last):
2025-12-04T10:11:57.6887944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6888016Z     method(*args, **kwargs)
2025-12-04T10:11:57.6888313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6888414Z     method(*args, **kwargs)
2025-12-04T10:11:57.6888706Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6888770Z     with policy():
2025-12-04T10:11:57.6889078Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6889155Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6889953Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6889963Z 
2025-12-04T10:11:57.6890090Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6890601Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6890609Z 
2025-12-04T10:11:57.6890772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6890903Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6891000Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6891345Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6891474Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6891537Z graph_break []
2025-12-04T10:11:57.6891824Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6891897Z Traceback (most recent call last):
2025-12-04T10:11:57.6892256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6892321Z     method(*args, **kwargs)
2025-12-04T10:11:57.6892613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6892678Z     method(*args, **kwargs)
2025-12-04T10:11:57.6892966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6893031Z     with policy():
2025-12-04T10:11:57.6893331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6893395Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6894200Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6894207Z 
2025-12-04T10:11:57.6894329Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6894850Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6894853Z 
2025-12-04T10:11:57.6895012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6895205Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6895299Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6895644Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6895808Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6895869Z graph_break []
2025-12-04T10:11:57.6895998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6896089Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6896213Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6896559Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6896616Z graph_break []
2025-12-04T10:11:57.6896702Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6896996Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6897071Z Traceback (most recent call last):
2025-12-04T10:11:57.6897380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6897443Z     method(*args, **kwargs)
2025-12-04T10:11:57.6897733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6897802Z     method(*args, **kwargs)
2025-12-04T10:11:57.6898092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6898151Z     with policy():
2025-12-04T10:11:57.6898448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6898513Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6899320Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6899365Z 
2025-12-04T10:11:57.6899490Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6900009Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6900014Z 
2025-12-04T10:11:57.6900176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6900305Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6900399Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6900741Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6900869Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6900926Z graph_break []
2025-12-04T10:11:57.6901052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6901143Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6901265Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6901607Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6901673Z graph_break []
2025-12-04T10:11:57.6901867Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6901966Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6902087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6902456Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6902520Z graph_break []
2025-12-04T10:11:57.6903009Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml -
2025-12-04T10:11:57.6903114Z =========================== short test summary info ============================
2025-12-04T10:11:57.6904386Z FAILED [0.4405s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6904394Z 
2025-12-04T10:11:57.6904520Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6905035Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6905038Z 
2025-12-04T10:11:57.6905193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6905307Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6905424Z ================== 1 failed, 26 deselected, 2 rerun in 2.76s ===================
2025-12-04T10:11:57.6905484Z Got exit code 1
2025-12-04T10:11:57.6905550Z Retrying single test...
2025-12-04T10:11:57.6905812Z W1204 09:47:12.424000 49048 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6906202Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml
2025-12-04T10:11:57.6906341Z ============================= test session starts ==============================
2025-12-04T10:11:57.6906546Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6906617Z cachedir: .pytest_cache
2025-12-04T10:11:57.6906923Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6907000Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6907065Z configfile: pytest.ini
2025-12-04T10:11:57.6907383Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6907517Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6908081Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6908155Z Running 1 items in this shard
2025-12-04T10:11:57.6908165Z 
2025-12-04T10:11:57.6908892Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:13.468931279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6908897Z 
2025-12-04T10:11:57.6909264Z [W1204 09:47:22.608270471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6909272Z 
2025-12-04T10:11:57.6909565Z [W1204 09:47:22.608521445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6909603Z 
2025-12-04T10:11:57.6909892Z [W1204 09:47:22.614159122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6909895Z 
2025-12-04T10:11:57.6910199Z [W1204 09:47:22.614688740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6910202Z 
2025-12-04T10:11:57.6910496Z [W1204 09:47:22.614860534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6910500Z 
2025-12-04T10:11:57.6910797Z [W1204 09:47:22.620152374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6910801Z 
2025-12-04T10:11:57.6911087Z [W1204 09:47:22.620685213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6911092Z 
2025-12-04T10:11:57.6911383Z [W1204 09:47:22.620847496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6911386Z 
2025-12-04T10:11:57.6911469Z ('RERUN', {'yellow': True}) [10.9930s] [100%]
2025-12-04T10:11:57.6912189Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:23.796426214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6912198Z 
2025-12-04T10:11:57.6912493Z [W1204 09:47:23.796999693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6912496Z 
2025-12-04T10:11:57.6912786Z [W1204 09:47:23.797144736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6912827Z 
2025-12-04T10:11:57.6913119Z [W1204 09:47:23.800117977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6913122Z 
2025-12-04T10:11:57.6913409Z [W1204 09:47:23.800701457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6913412Z 
2025-12-04T10:11:57.6913703Z [W1204 09:47:23.800843069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6913706Z 
2025-12-04T10:11:57.6913996Z [W1204 09:47:23.805405008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6913999Z 
2025-12-04T10:11:57.6914290Z [W1204 09:47:23.805870696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6914295Z 
2025-12-04T10:11:57.6914586Z [W1204 09:47:23.806008088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6914590Z 
2025-12-04T10:11:57.6914671Z ('RERUN', {'yellow': True}) [0.4142s] [100%]
2025-12-04T10:11:57.6915387Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:24.207798797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6915391Z 
2025-12-04T10:11:57.6915745Z [W1204 09:47:24.208371647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6915753Z 
2025-12-04T10:11:57.6916039Z [W1204 09:47:24.208523710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6916100Z 
2025-12-04T10:11:57.6916393Z [W1204 09:47:24.211498741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6916396Z 
2025-12-04T10:11:57.6916687Z [W1204 09:47:24.212069401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6916690Z 
2025-12-04T10:11:57.6916977Z [W1204 09:47:24.212210073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6917088Z 
2025-12-04T10:11:57.6917386Z [W1204 09:47:24.216742481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6917389Z 
2025-12-04T10:11:57.6917677Z [W1204 09:47:24.217204769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6917680Z 
2025-12-04T10:11:57.6917968Z [W1204 09:47:24.217342092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6917973Z 
2025-12-04T10:11:57.6918033Z FAILED [0.4100s] [100%]
2025-12-04T10:11:57.6918037Z 
2025-12-04T10:11:57.6918120Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6918415Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6918489Z Traceback (most recent call last):
2025-12-04T10:11:57.6918801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6918867Z     method(*args, **kwargs)
2025-12-04T10:11:57.6919162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6919228Z     method(*args, **kwargs)
2025-12-04T10:11:57.6919520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6919650Z     with policy():
2025-12-04T10:11:57.6919985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6920053Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6920871Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6920875Z 
2025-12-04T10:11:57.6921003Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6921521Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6921528Z 
2025-12-04T10:11:57.6921687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6921814Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6921915Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6922260Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6922390Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6922547Z graph_break []
2025-12-04T10:11:57.6922675Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6923374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6923492Z   if out == self.unknown_value:
2025-12-04T10:11:57.6923784Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6923862Z Traceback (most recent call last):
2025-12-04T10:11:57.6924161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6924227Z     method(*args, **kwargs)
2025-12-04T10:11:57.6924521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6924584Z     method(*args, **kwargs)
2025-12-04T10:11:57.6924877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6924937Z     with policy():
2025-12-04T10:11:57.6925240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6925305Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6926104Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6926108Z 
2025-12-04T10:11:57.6926236Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6926754Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6926757Z 
2025-12-04T10:11:57.6926916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6927080Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6927174Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6927524Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6927651Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6927728Z graph_break []
2025-12-04T10:11:57.6927854Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6928548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6928623Z   if out == self.unknown_value:
2025-12-04T10:11:57.6928749Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6928847Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6928970Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6929312Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6929373Z graph_break []
2025-12-04T10:11:57.6929456Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6929810Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6929888Z Traceback (most recent call last):
2025-12-04T10:11:57.6930188Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6930293Z     method(*args, **kwargs)
2025-12-04T10:11:57.6930593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6930657Z     method(*args, **kwargs)
2025-12-04T10:11:57.6930953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6931011Z     with policy():
2025-12-04T10:11:57.6931302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6931382Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6932192Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6932198Z 
2025-12-04T10:11:57.6932328Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6932845Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6932848Z 
2025-12-04T10:11:57.6933009Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6933133Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6933223Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6933577Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6933699Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6933765Z graph_break []
2025-12-04T10:11:57.6933928Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6934613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6934686Z   if out == self.unknown_value:
2025-12-04T10:11:57.6934807Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6934899Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6935028Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6935371Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6935439Z graph_break []
2025-12-04T10:11:57.6935571Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6935662Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6935785Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6936127Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6936189Z graph_break []
2025-12-04T10:11:57.6936672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml -
2025-12-04T10:11:57.6936845Z =========================== short test summary info ============================
2025-12-04T10:11:57.6938129Z FAILED [0.4100s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6938167Z 
2025-12-04T10:11:57.6938291Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6938809Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6938812Z 
2025-12-04T10:11:57.6938969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6939075Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6939191Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ==================
2025-12-04T10:11:57.6939255Z Got exit code 1
2025-12-04T10:11:57.6939328Z Retrying single test...
2025-12-04T10:11:57.6939589Z W1204 09:47:30.809000 49234 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6939975Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml
2025-12-04T10:11:57.6940071Z ============================= test session starts ==============================
2025-12-04T10:11:57.6940279Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6940355Z cachedir: .pytest_cache
2025-12-04T10:11:57.6940660Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6940735Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6940806Z configfile: pytest.ini
2025-12-04T10:11:57.6941159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6941293Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.6941863Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6941931Z Running 1 items in this shard
2025-12-04T10:11:57.6941934Z 
2025-12-04T10:11:57.6942671Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:31.867734784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6942675Z 
2025-12-04T10:11:57.6942971Z [W1204 09:47:41.987256046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6942977Z 
2025-12-04T10:11:57.6943273Z [W1204 09:47:41.987511401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6943277Z 
2025-12-04T10:11:57.6943568Z [W1204 09:47:41.993481393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6943571Z 
2025-12-04T10:11:57.6943859Z [W1204 09:47:41.994077613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6943863Z 
2025-12-04T10:11:57.6944252Z [W1204 09:47:41.994245386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6944256Z 
2025-12-04T10:11:57.6944552Z [W1204 09:47:41.999602488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6944591Z 
2025-12-04T10:11:57.6944880Z [W1204 09:47:41.000143247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6944882Z 
2025-12-04T10:11:57.6945169Z [W1204 09:47:41.000319840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6945176Z 
2025-12-04T10:11:57.6945256Z ('RERUN', {'yellow': True}) [10.9862s] [100%]
2025-12-04T10:11:57.6945975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:42.176240715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6945979Z 
2025-12-04T10:11:57.6946274Z [W1204 09:47:42.176833565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6946279Z 
2025-12-04T10:11:57.6946567Z [W1204 09:47:42.176979528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6946570Z 
2025-12-04T10:11:57.6946861Z [W1204 09:47:42.179905278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6946865Z 
2025-12-04T10:11:57.6947154Z [W1204 09:47:42.180501868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6947157Z 
2025-12-04T10:11:57.6947450Z [W1204 09:47:42.180647451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6947453Z 
2025-12-04T10:11:57.6947740Z [W1204 09:47:42.185136827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6947781Z 
2025-12-04T10:11:57.6948074Z [W1204 09:47:42.185595005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6948077Z 
2025-12-04T10:11:57.6948366Z [W1204 09:47:42.185730668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6948369Z 
2025-12-04T10:11:57.6948450Z ('RERUN', {'yellow': True}) [0.4113s] [100%]
2025-12-04T10:11:57.6949190Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:47:42.584632488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6949194Z 
2025-12-04T10:11:57.6949483Z [W1204 09:47:42.585215048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6949490Z 
2025-12-04T10:11:57.6949781Z [W1204 09:47:42.585359050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6949785Z 
2025-12-04T10:11:57.6950071Z [W1204 09:47:42.588259700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6950075Z 
2025-12-04T10:11:57.6950365Z [W1204 09:47:42.588826930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6950368Z 
2025-12-04T10:11:57.6950721Z [W1204 09:47:42.588967022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6950725Z 
2025-12-04T10:11:57.6951019Z [W1204 09:47:42.593462549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6951057Z 
2025-12-04T10:11:57.6951344Z [W1204 09:47:42.593962038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6951347Z 
2025-12-04T10:11:57.6951633Z [W1204 09:47:42.594099000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.6951639Z 
2025-12-04T10:11:57.6951703Z FAILED [0.4066s] [100%]
2025-12-04T10:11:57.6951706Z 
2025-12-04T10:11:57.6951787Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6952084Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6952159Z Traceback (most recent call last):
2025-12-04T10:11:57.6952469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6952541Z     method(*args, **kwargs)
2025-12-04T10:11:57.6952836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6952903Z     method(*args, **kwargs)
2025-12-04T10:11:57.6953189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6953248Z     with policy():
2025-12-04T10:11:57.6953547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6953612Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6954410Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6954451Z 
2025-12-04T10:11:57.6954581Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6955099Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6955108Z 
2025-12-04T10:11:57.6955275Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6955403Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6955505Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6955855Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6955982Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6956047Z graph_break []
2025-12-04T10:11:57.6956173Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6956873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6956944Z   if out == self.unknown_value:
2025-12-04T10:11:57.6957232Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6957311Z Traceback (most recent call last):
2025-12-04T10:11:57.6957680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6957797Z     method(*args, **kwargs)
2025-12-04T10:11:57.6958154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6958374Z     method(*args, **kwargs)
2025-12-04T10:11:57.6958724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6958812Z     with policy():
2025-12-04T10:11:57.6963099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6963209Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6964063Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6964069Z 
2025-12-04T10:11:57.6964210Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6964748Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6964757Z 
2025-12-04T10:11:57.6964922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6965064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6965164Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6965517Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6965657Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6965718Z graph_break []
2025-12-04T10:11:57.6965864Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6966574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6966723Z   if out == self.unknown_value:
2025-12-04T10:11:57.6966861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6966959Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6967093Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6967442Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6967502Z graph_break []
2025-12-04T10:11:57.6967590Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6967891Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.6967973Z Traceback (most recent call last):
2025-12-04T10:11:57.6968314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6968381Z     method(*args, **kwargs)
2025-12-04T10:11:57.6968676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6968737Z     method(*args, **kwargs)
2025-12-04T10:11:57.6969027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6969163Z     with policy():
2025-12-04T10:11:57.6969473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6969544Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6970359Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6970400Z 
2025-12-04T10:11:57.6970535Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6971058Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6971062Z 
2025-12-04T10:11:57.6971228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6971363Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6971459Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6971813Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6971950Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6972008Z graph_break []
2025-12-04T10:11:57.6972134Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.6972838Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.6972912Z   if out == self.unknown_value:
2025-12-04T10:11:57.6973037Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6973128Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6973260Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6973659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6973718Z graph_break []
2025-12-04T10:11:57.6973846Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6973933Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6974054Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6974409Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6974472Z graph_break []
2025-12-04T10:11:57.6974974Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml -
2025-12-04T10:11:57.6975080Z =========================== short test summary info ============================
2025-12-04T10:11:57.6976361Z FAILED [0.4066s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6976370Z 
2025-12-04T10:11:57.6976588Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6977110Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6977146Z 
2025-12-04T10:11:57.6977310Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6977421Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.6977541Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ==================
2025-12-04T10:11:57.6977605Z Got exit code 1
2025-12-04T10:11:57.6978078Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.6978326Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.6978593Z W1204 09:47:49.188000 49420 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.6978985Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml
2025-12-04T10:11:57.6979090Z ============================= test session starts ==============================
2025-12-04T10:11:57.6979297Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.6979367Z cachedir: .pytest_cache
2025-12-04T10:11:57.6979685Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.6979764Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.6979835Z configfile: pytest.ini
2025-12-04T10:11:57.6980153Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.6980289Z collecting ... collected 58 items / 27 deselected / 31 selected
2025-12-04T10:11:57.6980377Z stepcurrent: skipping 27 already run items.
2025-12-04T10:11:57.6980447Z Running 31 items in this shard
2025-12-04T10:11:57.6980451Z 
2025-12-04T10:11:57.6980950Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9343s] [  3%]
2025-12-04T10:11:57.6981471Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5512s] [  3%]
2025-12-04T10:11:57.6981914Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5439s] [  3%]
2025-12-04T10:11:57.6981918Z 
2025-12-04T10:11:57.6982003Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.6982290Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6982368Z Traceback (most recent call last):
2025-12-04T10:11:57.6982678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6982749Z     method(*args, **kwargs)
2025-12-04T10:11:57.6983037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6983100Z     method(*args, **kwargs)
2025-12-04T10:11:57.6983388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6983448Z     with policy():
2025-12-04T10:11:57.6983811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6983880Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6984671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.6984710Z 
2025-12-04T10:11:57.6984841Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6985353Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6985358Z 
2025-12-04T10:11:57.6985520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6985650Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6985745Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6986295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6986425Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6986486Z graph_break []
2025-12-04T10:11:57.6986772Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6986844Z Traceback (most recent call last):
2025-12-04T10:11:57.6987336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6987406Z     method(*args, **kwargs)
2025-12-04T10:11:57.6987704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6987771Z     method(*args, **kwargs)
2025-12-04T10:11:57.6988057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6988188Z     with policy():
2025-12-04T10:11:57.6988492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6988557Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6989362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.6989366Z 
2025-12-04T10:11:57.6989495Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6990012Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6990017Z 
2025-12-04T10:11:57.6990176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6990308Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6990400Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6990943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6991075Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6991134Z graph_break []
2025-12-04T10:11:57.6991327Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6991424Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6991544Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6992121Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6992183Z graph_break []
2025-12-04T10:11:57.6992272Z =================================== FAILURES ===================================
2025-12-04T10:11:57.6992561Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.6992635Z Traceback (most recent call last):
2025-12-04T10:11:57.6992938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6993001Z     method(*args, **kwargs)
2025-12-04T10:11:57.6993288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.6993354Z     method(*args, **kwargs)
2025-12-04T10:11:57.6993643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.6993702Z     with policy():
2025-12-04T10:11:57.6993994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.6994057Z     raise RuntimeError(msg)
2025-12-04T10:11:57.6994872Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.6994878Z 
2025-12-04T10:11:57.6995006Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.6995523Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.6995567Z 
2025-12-04T10:11:57.6995721Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.6995844Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6995936Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6996476Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6996606Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6996663Z graph_break []
2025-12-04T10:11:57.6996787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6996877Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6996998Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6997529Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6997589Z graph_break []
2025-12-04T10:11:57.6997709Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.6997798Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.6997918Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.6998514Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.6998605Z graph_break []
2025-12-04T10:11:57.6999109Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml -
2025-12-04T10:11:57.6999218Z =========================== short test summary info ============================
2025-12-04T10:11:57.7000602Z FAILED [0.5439s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7000608Z 
2025-12-04T10:11:57.7000741Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7001257Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7001263Z 
2025-12-04T10:11:57.7001416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7001523Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7001635Z ================== 1 failed, 27 deselected, 2 rerun in 3.05s ===================
2025-12-04T10:11:57.7001695Z Got exit code 1
2025-12-04T10:11:57.7001761Z Retrying single test...
2025-12-04T10:11:57.7002026Z W1204 09:47:58.818000 49602 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7002413Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml
2025-12-04T10:11:57.7002510Z ============================= test session starts ==============================
2025-12-04T10:11:57.7002777Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7002845Z cachedir: .pytest_cache
2025-12-04T10:11:57.7003151Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7003230Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7003293Z configfile: pytest.ini
2025-12-04T10:11:57.7003608Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7003745Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7004309Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7004384Z Running 1 items in this shard
2025-12-04T10:11:57.7004388Z 
2025-12-04T10:11:57.7005111Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:00.415014067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7005116Z 
2025-12-04T10:11:57.7005417Z [W1204 09:48:09.482255486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7005420Z 
2025-12-04T10:11:57.7005798Z [W1204 09:48:09.482524060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7005803Z 
2025-12-04T10:11:57.7006090Z [W1204 09:48:09.488645096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7006128Z 
2025-12-04T10:11:57.7006421Z [W1204 09:48:09.489229386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7006424Z 
2025-12-04T10:11:57.7006713Z [W1204 09:48:09.489402869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7006716Z 
2025-12-04T10:11:57.7007006Z [W1204 09:48:09.494927064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7007010Z 
2025-12-04T10:11:57.7007297Z [W1204 09:48:09.495468174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7007300Z 
2025-12-04T10:11:57.7007594Z [W1204 09:48:09.495629156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7007599Z 
2025-12-04T10:11:57.7007682Z ('RERUN', {'yellow': True}) [11.0182s] [100%]
2025-12-04T10:11:57.7008407Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:10.295454427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7008411Z 
2025-12-04T10:11:57.7008700Z [W1204 09:48:10.296000186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7008704Z 
2025-12-04T10:11:57.7009001Z [W1204 09:48:10.296138049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7009008Z 
2025-12-04T10:11:57.7009296Z [W1204 09:48:10.299062429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7009336Z 
2025-12-04T10:11:57.7009626Z [W1204 09:48:10.299509557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7009629Z 
2025-12-04T10:11:57.7009920Z [W1204 09:48:10.299644759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7009924Z 
2025-12-04T10:11:57.7010209Z [W1204 09:48:10.304161947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7010213Z 
2025-12-04T10:11:57.7010504Z [W1204 09:48:10.304628815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7010508Z 
2025-12-04T10:11:57.7010794Z [W1204 09:48:10.304763547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7010799Z 
2025-12-04T10:11:57.7010882Z ('RERUN', {'yellow': True}) [0.4995s] [100%]
2025-12-04T10:11:57.7011595Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:10.794096739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7011599Z 
2025-12-04T10:11:57.7011898Z [W1204 09:48:10.794637819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7011902Z 
2025-12-04T10:11:57.7012256Z [W1204 09:48:10.794775551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7012260Z 
2025-12-04T10:11:57.7012548Z [W1204 09:48:10.797719781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7012589Z 
2025-12-04T10:11:57.7012876Z [W1204 09:48:10.798167049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7012878Z 
2025-12-04T10:11:57.7013168Z [W1204 09:48:10.798301771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7013171Z 
2025-12-04T10:11:57.7013475Z [W1204 09:48:10.802953912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7013478Z 
2025-12-04T10:11:57.7013766Z [W1204 09:48:10.803413050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7013769Z 
2025-12-04T10:11:57.7014061Z [W1204 09:48:10.803550042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7014064Z 
2025-12-04T10:11:57.7014125Z FAILED [0.4978s] [100%]
2025-12-04T10:11:57.7014130Z 
2025-12-04T10:11:57.7014217Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7014511Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7014586Z Traceback (most recent call last):
2025-12-04T10:11:57.7014897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7014966Z     method(*args, **kwargs)
2025-12-04T10:11:57.7015260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7015327Z     method(*args, **kwargs)
2025-12-04T10:11:57.7015612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7015676Z     with policy():
2025-12-04T10:11:57.7015971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7016076Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7016880Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7016886Z 
2025-12-04T10:11:57.7017416Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7017958Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7017963Z 
2025-12-04T10:11:57.7018129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7018263Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7018374Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7018930Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7019065Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7019125Z graph_break []
2025-12-04T10:11:57.7019379Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7020097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7020219Z   if out == self.unknown_value:
2025-12-04T10:11:57.7020520Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7020594Z Traceback (most recent call last):
2025-12-04T10:11:57.7020900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7020969Z     method(*args, **kwargs)
2025-12-04T10:11:57.7021259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7021324Z     method(*args, **kwargs)
2025-12-04T10:11:57.7021616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7021675Z     with policy():
2025-12-04T10:11:57.7021969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7022037Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7022845Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7022854Z 
2025-12-04T10:11:57.7022984Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7023506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7023510Z 
2025-12-04T10:11:57.7023674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7023801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7023955Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7024505Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7024631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7024693Z graph_break []
2025-12-04T10:11:57.7024814Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7025511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7025581Z   if out == self.unknown_value:
2025-12-04T10:11:57.7025704Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7025800Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7025934Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7026477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7026541Z graph_break []
2025-12-04T10:11:57.7026624Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7026988Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7027062Z Traceback (most recent call last):
2025-12-04T10:11:57.7027362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7027465Z     method(*args, **kwargs)
2025-12-04T10:11:57.7027755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7027821Z     method(*args, **kwargs)
2025-12-04T10:11:57.7028110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7028168Z     with policy():
2025-12-04T10:11:57.7028464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7028529Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7029333Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7029344Z 
2025-12-04T10:11:57.7029473Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7029990Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7029994Z 
2025-12-04T10:11:57.7030153Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7030278Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7030369Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7030913Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7031037Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7031154Z graph_break []
2025-12-04T10:11:57.7031276Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7031969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7032044Z   if out == self.unknown_value:
2025-12-04T10:11:57.7032166Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7032262Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7032382Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7032919Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7032984Z graph_break []
2025-12-04T10:11:57.7033104Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7033195Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7033315Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7033850Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7033979Z graph_break []
2025-12-04T10:11:57.7034468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml -
2025-12-04T10:11:57.7034569Z =========================== short test summary info ============================
2025-12-04T10:11:57.7035877Z FAILED [0.4978s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7035883Z 
2025-12-04T10:11:57.7036011Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7036530Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7036533Z 
2025-12-04T10:11:57.7036687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7036800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7036915Z ================== 1 failed, 57 deselected, 2 rerun in 12.04s ==================
2025-12-04T10:11:57.7036975Z Got exit code 1
2025-12-04T10:11:57.7037038Z Retrying single test...
2025-12-04T10:11:57.7037303Z W1204 09:48:17.434000 49789 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7037695Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml
2025-12-04T10:11:57.7037792Z ============================= test session starts ==============================
2025-12-04T10:11:57.7038015Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7038080Z cachedir: .pytest_cache
2025-12-04T10:11:57.7038391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7038512Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7038578Z configfile: pytest.ini
2025-12-04T10:11:57.7038891Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7039024Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7039590Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7039664Z Running 1 items in this shard
2025-12-04T10:11:57.7039667Z 
2025-12-04T10:11:57.7040434Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:19.031560224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7040443Z 
2025-12-04T10:11:57.7040746Z [W1204 09:48:28.272828993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7040750Z 
2025-12-04T10:11:57.7041039Z [W1204 09:48:28.273076717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7041042Z 
2025-12-04T10:11:57.7041398Z [W1204 09:48:28.279038520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7041409Z 
2025-12-04T10:11:57.7041709Z [W1204 09:48:28.279580709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7041712Z 
2025-12-04T10:11:57.7042032Z [W1204 09:48:28.279757852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7042037Z 
2025-12-04T10:11:57.7042330Z [W1204 09:48:28.285274146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7042334Z 
2025-12-04T10:11:57.7042621Z [W1204 09:48:28.285806115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7042624Z 
2025-12-04T10:11:57.7042913Z [W1204 09:48:28.285972608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7042919Z 
2025-12-04T10:11:57.7043000Z ('RERUN', {'yellow': True}) [11.1960s] [100%]
2025-12-04T10:11:57.7043726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:29.089755750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7043732Z 
2025-12-04T10:11:57.7044022Z [W1204 09:48:29.090337090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7044025Z 
2025-12-04T10:11:57.7044319Z [W1204 09:48:29.090486943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7044323Z 
2025-12-04T10:11:57.7044612Z [W1204 09:48:29.093535965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7044618Z 
2025-12-04T10:11:57.7044901Z [W1204 09:48:29.093996173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7044907Z 
2025-12-04T10:11:57.7045195Z [W1204 09:48:29.094134735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7045237Z 
2025-12-04T10:11:57.7045522Z [W1204 09:48:29.098764844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7045525Z 
2025-12-04T10:11:57.7045815Z [W1204 09:48:29.099225862 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7045818Z 
2025-12-04T10:11:57.7046102Z [W1204 09:48:29.099361704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7046105Z 
2025-12-04T10:11:57.7046190Z ('RERUN', {'yellow': True}) [0.4982s] [100%]
2025-12-04T10:11:57.7046905Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:48:29.585067686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7046911Z 
2025-12-04T10:11:57.7047206Z [W1204 09:48:29.585629596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7047208Z 
2025-12-04T10:11:57.7047494Z [W1204 09:48:29.585772148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7047496Z 
2025-12-04T10:11:57.7047781Z [W1204 09:48:29.588770630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7047855Z 
2025-12-04T10:11:57.7048143Z [W1204 09:48:29.589235468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7048146Z 
2025-12-04T10:11:57.7048433Z [W1204 09:48:29.589373411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7048472Z 
2025-12-04T10:11:57.7048760Z [W1204 09:48:29.594058680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7048763Z 
2025-12-04T10:11:57.7049054Z [W1204 09:48:29.594533509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7049057Z 
2025-12-04T10:11:57.7049347Z [W1204 09:48:29.594671611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7049351Z 
2025-12-04T10:11:57.7049414Z FAILED [0.4950s] [100%]
2025-12-04T10:11:57.7049417Z 
2025-12-04T10:11:57.7049502Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7049792Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7049868Z Traceback (most recent call last):
2025-12-04T10:11:57.7050180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7050245Z     method(*args, **kwargs)
2025-12-04T10:11:57.7050538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7050603Z     method(*args, **kwargs)
2025-12-04T10:11:57.7050891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7050953Z     with policy():
2025-12-04T10:11:57.7051247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7051324Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7052125Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7052169Z 
2025-12-04T10:11:57.7052300Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7052819Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7052823Z 
2025-12-04T10:11:57.7052986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7053118Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7053211Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7053763Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7053892Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7053954Z graph_break []
2025-12-04T10:11:57.7054076Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7054768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7054901Z   if out == self.unknown_value:
2025-12-04T10:11:57.7055201Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7055279Z Traceback (most recent call last):
2025-12-04T10:11:57.7055628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7055697Z     method(*args, **kwargs)
2025-12-04T10:11:57.7055983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7056043Z     method(*args, **kwargs)
2025-12-04T10:11:57.7056336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7056395Z     with policy():
2025-12-04T10:11:57.7056689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7056759Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7057555Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7057562Z 
2025-12-04T10:11:57.7057691Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7058206Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7058210Z 
2025-12-04T10:11:57.7058369Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7058493Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7058584Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7059129Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7059293Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7059354Z graph_break []
2025-12-04T10:11:57.7059476Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7060159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7060231Z   if out == self.unknown_value:
2025-12-04T10:11:57.7060353Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7060449Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7060578Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7061118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7061180Z graph_break []
2025-12-04T10:11:57.7061269Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7061557Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7061635Z Traceback (most recent call last):
2025-12-04T10:11:57.7062080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7062148Z     method(*args, **kwargs)
2025-12-04T10:11:57.7062437Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7062498Z     method(*args, **kwargs)
2025-12-04T10:11:57.7062828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7062887Z     with policy():
2025-12-04T10:11:57.7063184Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7063252Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7064067Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7064071Z 
2025-12-04T10:11:57.7064210Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7064726Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7064733Z 
2025-12-04T10:11:57.7064895Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7065021Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7065115Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7065663Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7065792Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7065851Z graph_break []
2025-12-04T10:11:57.7065974Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7066657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7066771Z   if out == self.unknown_value:
2025-12-04T10:11:57.7066890Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7066982Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7067104Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7067649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7067711Z graph_break []
2025-12-04T10:11:57.7067833Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7067922Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7068047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7068581Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7068641Z graph_break []
2025-12-04T10:11:57.7069127Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml -
2025-12-04T10:11:57.7069295Z =========================== short test summary info ============================
2025-12-04T10:11:57.7070570Z FAILED [0.4950s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7070608Z 
2025-12-04T10:11:57.7070737Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7071257Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7071261Z 
2025-12-04T10:11:57.7071420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7071527Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7071642Z ================== 1 failed, 57 deselected, 2 rerun in 12.21s ==================
2025-12-04T10:11:57.7071704Z Got exit code 1
2025-12-04T10:11:57.7072174Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7072416Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7072682Z W1204 09:48:36.292000 49976 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7073068Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml
2025-12-04T10:11:57.7073168Z ============================= test session starts ==============================
2025-12-04T10:11:57.7073377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7073444Z cachedir: .pytest_cache
2025-12-04T10:11:57.7073754Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7073869Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7073934Z configfile: pytest.ini
2025-12-04T10:11:57.7074253Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7074381Z collecting ... collected 58 items / 28 deselected / 30 selected
2025-12-04T10:11:57.7074469Z stepcurrent: skipping 28 already run items.
2025-12-04T10:11:57.7074540Z Running 30 items in this shard
2025-12-04T10:11:57.7074544Z 
2025-12-04T10:11:57.7075048Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8873s] [  3%]
2025-12-04T10:11:57.7075539Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4851s] [  3%]
2025-12-04T10:11:57.7075982Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4820s] [  3%]
2025-12-04T10:11:57.7075986Z 
2025-12-04T10:11:57.7076074Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7076364Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7076437Z Traceback (most recent call last):
2025-12-04T10:11:57.7076814Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7076882Z     method(*args, **kwargs)
2025-12-04T10:11:57.7077175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7077272Z     method(*args, **kwargs)
2025-12-04T10:11:57.7077559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7077629Z     with policy():
2025-12-04T10:11:57.7077928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7077994Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7078807Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7078811Z 
2025-12-04T10:11:57.7078936Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7079477Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7079483Z 
2025-12-04T10:11:57.7079644Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7079780Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7079912Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7080268Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7080398Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7080457Z graph_break []
2025-12-04T10:11:57.7080748Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7080868Z Traceback (most recent call last):
2025-12-04T10:11:57.7081172Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7081237Z     method(*args, **kwargs)
2025-12-04T10:11:57.7081526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7081589Z     method(*args, **kwargs)
2025-12-04T10:11:57.7081883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7081942Z     with policy():
2025-12-04T10:11:57.7082237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7082306Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7083130Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7083136Z 
2025-12-04T10:11:57.7083266Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7083784Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7083788Z 
2025-12-04T10:11:57.7084030Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7084160Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7084252Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7084601Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7084775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7084837Z graph_break []
2025-12-04T10:11:57.7084963Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7085051Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7085173Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7085515Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7085573Z graph_break []
2025-12-04T10:11:57.7085662Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7085951Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7086029Z Traceback (most recent call last):
2025-12-04T10:11:57.7086321Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7086382Z     method(*args, **kwargs)
2025-12-04T10:11:57.7086675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7086736Z     method(*args, **kwargs)
2025-12-04T10:11:57.7087020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7087081Z     with policy():
2025-12-04T10:11:57.7087377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7087444Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7088261Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7088305Z 
2025-12-04T10:11:57.7088438Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7088955Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7088959Z 
2025-12-04T10:11:57.7089115Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7089241Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7089330Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7089673Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7089799Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7089856Z graph_break []
2025-12-04T10:11:57.7089981Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7090068Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7090187Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7090595Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7090653Z graph_break []
2025-12-04T10:11:57.7090777Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7090864Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7091015Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7091358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7091415Z graph_break []
2025-12-04T10:11:57.7091905Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml -
2025-12-04T10:11:57.7092007Z =========================== short test summary info ============================
2025-12-04T10:11:57.7093297Z FAILED [0.4820s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7093307Z 
2025-12-04T10:11:57.7093430Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7093948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7093952Z 
2025-12-04T10:11:57.7094109Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7094216Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7094338Z ================== 1 failed, 28 deselected, 2 rerun in 2.88s ===================
2025-12-04T10:11:57.7094397Z Got exit code 1
2025-12-04T10:11:57.7094461Z Retrying single test...
2025-12-04T10:11:57.7094728Z W1204 09:48:45.897000 50164 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7095152Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml
2025-12-04T10:11:57.7095245Z ============================= test session starts ==============================
2025-12-04T10:11:57.7095464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7095530Z cachedir: .pytest_cache
2025-12-04T10:11:57.7095839Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7095917Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7095981Z configfile: pytest.ini
2025-12-04T10:11:57.7096300Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7096431Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7097003Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7097076Z Running 1 items in this shard
2025-12-04T10:11:57.7097080Z 
2025-12-04T10:11:57.7097874Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:48:47.996320152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7097879Z 
2025-12-04T10:11:57.7098183Z [W1204 09:48:56.146412430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7098186Z 
2025-12-04T10:11:57.7098510Z [W1204 09:48:56.146673974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7098515Z 
2025-12-04T10:11:57.7098808Z [W1204 09:48:56.152644417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7098811Z 
2025-12-04T10:11:57.7099097Z [W1204 09:48:56.153235327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7099100Z 
2025-12-04T10:11:57.7099387Z [W1204 09:48:56.153411510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7099393Z 
2025-12-04T10:11:57.7099677Z [W1204 09:48:56.159002856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7099680Z 
2025-12-04T10:11:57.7099965Z [W1204 09:48:56.159534675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7099970Z 
2025-12-04T10:11:57.7100255Z [W1204 09:48:56.159694707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7100259Z 
2025-12-04T10:11:57.7100342Z ('RERUN', {'yellow': True}) [11.0563s] [100%]
2025-12-04T10:11:57.7101071Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:48:57.373022273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7101077Z 
2025-12-04T10:11:57.7101365Z [W1204 09:48:57.373548973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7101369Z 
2025-12-04T10:11:57.7101654Z [W1204 09:48:57.373690215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7101696Z 
2025-12-04T10:11:57.7101982Z [W1204 09:48:57.376679706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7101985Z 
2025-12-04T10:11:57.7102276Z [W1204 09:48:57.377243446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7102280Z 
2025-12-04T10:11:57.7102565Z [W1204 09:48:57.377381748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7102569Z 
2025-12-04T10:11:57.7102859Z [W1204 09:48:57.381967147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7102862Z 
2025-12-04T10:11:57.7103145Z [W1204 09:48:57.382439665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7103152Z 
2025-12-04T10:11:57.7103438Z [W1204 09:48:57.382576277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7103441Z 
2025-12-04T10:11:57.7103519Z ('RERUN', {'yellow': True}) [0.4566s] [100%]
2025-12-04T10:11:57.7104239Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:48:57.827120167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7104243Z 
2025-12-04T10:11:57.7104601Z [W1204 09:48:57.827653116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7104605Z 
2025-12-04T10:11:57.7104892Z [W1204 09:48:57.827793688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7104931Z 
2025-12-04T10:11:57.7105228Z [W1204 09:48:57.830770069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7105231Z 
2025-12-04T10:11:57.7105519Z [W1204 09:48:57.831333039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7105522Z 
2025-12-04T10:11:57.7105812Z [W1204 09:48:57.831473191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7105816Z 
2025-12-04T10:11:57.7106103Z [W1204 09:48:57.836037260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7106107Z 
2025-12-04T10:11:57.7106392Z [W1204 09:48:57.836517458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7106398Z 
2025-12-04T10:11:57.7106682Z [W1204 09:48:57.836654700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7106685Z 
2025-12-04T10:11:57.7106747Z FAILED [0.4527s] [100%]
2025-12-04T10:11:57.7106750Z 
2025-12-04T10:11:57.7106839Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7107136Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7107215Z Traceback (most recent call last):
2025-12-04T10:11:57.7107534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7107600Z     method(*args, **kwargs)
2025-12-04T10:11:57.7107896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7108001Z     method(*args, **kwargs)
2025-12-04T10:11:57.7108291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7108350Z     with policy():
2025-12-04T10:11:57.7108644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7108710Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7109520Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7109524Z 
2025-12-04T10:11:57.7109654Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7110175Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7110181Z 
2025-12-04T10:11:57.7110337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7110468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7110566Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7110918Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7111129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7111190Z graph_break []
2025-12-04T10:11:57.7111318Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7112043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7112121Z   if out == self.unknown_value:
2025-12-04T10:11:57.7112412Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7112483Z Traceback (most recent call last):
2025-12-04T10:11:57.7112789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7112856Z     method(*args, **kwargs)
2025-12-04T10:11:57.7113151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7113216Z     method(*args, **kwargs)
2025-12-04T10:11:57.7113502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7113568Z     with policy():
2025-12-04T10:11:57.7113860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7113923Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7114741Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7114745Z 
2025-12-04T10:11:57.7114874Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7115393Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7115436Z 
2025-12-04T10:11:57.7115597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7115720Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7115817Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7116164Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7116293Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7116351Z graph_break []
2025-12-04T10:11:57.7116477Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7117297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7117374Z   if out == self.unknown_value:
2025-12-04T10:11:57.7117510Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7117602Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7117726Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7118070Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7118128Z graph_break []
2025-12-04T10:11:57.7118321Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7118621Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7118697Z Traceback (most recent call last):
2025-12-04T10:11:57.7119055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7119122Z     method(*args, **kwargs)
2025-12-04T10:11:57.7119412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7119477Z     method(*args, **kwargs)
2025-12-04T10:11:57.7119765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7119825Z     with policy():
2025-12-04T10:11:57.7120161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7120226Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7121044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7121052Z 
2025-12-04T10:11:57.7121176Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7121715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7121719Z 
2025-12-04T10:11:57.7121877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7122003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7122096Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7122438Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7122623Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7122681Z graph_break []
2025-12-04T10:11:57.7122804Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7123488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7123557Z   if out == self.unknown_value:
2025-12-04T10:11:57.7123679Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7123771Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7123892Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7124233Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7124295Z graph_break []
2025-12-04T10:11:57.7124415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7124505Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7124623Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7124965Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7125022Z graph_break []
2025-12-04T10:11:57.7125578Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml -
2025-12-04T10:11:57.7125684Z =========================== short test summary info ============================
2025-12-04T10:11:57.7126979Z FAILED [0.4527s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7127019Z 
2025-12-04T10:11:57.7127149Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7127668Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7127672Z 
2025-12-04T10:11:57.7127832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7127938Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7128054Z ================== 1 failed, 57 deselected, 2 rerun in 11.99s ==================
2025-12-04T10:11:57.7128117Z Got exit code 1
2025-12-04T10:11:57.7128180Z Retrying single test...
2025-12-04T10:11:57.7128445Z W1204 09:49:04.466000 50357 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7128833Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml
2025-12-04T10:11:57.7128926Z ============================= test session starts ==============================
2025-12-04T10:11:57.7129146Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7129212Z cachedir: .pytest_cache
2025-12-04T10:11:57.7129518Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7129638Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7129701Z configfile: pytest.ini
2025-12-04T10:11:57.7130015Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7130143Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7130713Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7130787Z Running 1 items in this shard
2025-12-04T10:11:57.7130793Z 
2025-12-04T10:11:57.7131521Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:49:05.577562747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7131528Z 
2025-12-04T10:11:57.7131828Z [W1204 09:49:14.680533992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7131832Z 
2025-12-04T10:11:57.7132121Z [W1204 09:49:14.680781566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7132124Z 
2025-12-04T10:11:57.7132418Z [W1204 09:49:14.686594506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7132421Z 
2025-12-04T10:11:57.7132774Z [W1204 09:49:14.687180466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7132778Z 
2025-12-04T10:11:57.7133071Z [W1204 09:49:14.687350149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7133110Z 
2025-12-04T10:11:57.7133400Z [W1204 09:49:14.692941135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7133403Z 
2025-12-04T10:11:57.7133689Z [W1204 09:49:14.693475344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7133692Z 
2025-12-04T10:11:57.7133984Z [W1204 09:49:14.693637397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7133987Z 
2025-12-04T10:11:57.7134069Z ('RERUN', {'yellow': True}) [11.0229s] [100%]
2025-12-04T10:11:57.7134797Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:49:15.909138679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7134804Z 
2025-12-04T10:11:57.7135091Z [W1204 09:49:15.909670218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7135095Z 
2025-12-04T10:11:57.7135382Z [W1204 09:49:15.909811320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7135385Z 
2025-12-04T10:11:57.7135671Z [W1204 09:49:15.912828062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7135674Z 
2025-12-04T10:11:57.7135964Z [W1204 09:49:15.913400282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7135968Z 
2025-12-04T10:11:57.7136253Z [W1204 09:49:15.913539715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7136311Z 
2025-12-04T10:11:57.7136597Z [W1204 09:49:15.918176324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7136604Z 
2025-12-04T10:11:57.7136890Z [W1204 09:49:15.918644752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7136893Z 
2025-12-04T10:11:57.7137180Z [W1204 09:49:15.918782205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7137183Z 
2025-12-04T10:11:57.7137263Z ('RERUN', {'yellow': True}) [0.4531s] [100%]
2025-12-04T10:11:57.7137985Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:49:16.360601137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7137991Z 
2025-12-04T10:11:57.7138282Z [W1204 09:49:16.361123296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7138285Z 
2025-12-04T10:11:57.7138571Z [W1204 09:49:16.361260898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7138575Z 
2025-12-04T10:11:57.7138861Z [W1204 09:49:16.364125508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7138864Z 
2025-12-04T10:11:57.7139216Z [W1204 09:49:16.364687737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7139219Z 
2025-12-04T10:11:57.7139518Z [W1204 09:49:16.364824990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7139554Z 
2025-12-04T10:11:57.7139845Z [W1204 09:49:16.369276246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7139848Z 
2025-12-04T10:11:57.7140138Z [W1204 09:49:16.369728024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7140143Z 
2025-12-04T10:11:57.7140427Z [W1204 09:49:16.369861857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7140430Z 
2025-12-04T10:11:57.7140490Z FAILED [0.4506s] [100%]
2025-12-04T10:11:57.7140494Z 
2025-12-04T10:11:57.7140584Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7140877Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7140950Z Traceback (most recent call last):
2025-12-04T10:11:57.7141259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7141322Z     method(*args, **kwargs)
2025-12-04T10:11:57.7141615Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7141677Z     method(*args, **kwargs)
2025-12-04T10:11:57.7141962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7142023Z     with policy():
2025-12-04T10:11:57.7142318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7142390Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7143197Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7143239Z 
2025-12-04T10:11:57.7143368Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7143891Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7143894Z 
2025-12-04T10:11:57.7144051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7144185Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7144279Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7144625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7144759Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7144829Z graph_break []
2025-12-04T10:11:57.7144959Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7145652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7145722Z   if out == self.unknown_value:
2025-12-04T10:11:57.7146337Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7146417Z Traceback (most recent call last):
2025-12-04T10:11:57.7146711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7146809Z     method(*args, **kwargs)
2025-12-04T10:11:57.7147098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7147163Z     method(*args, **kwargs)
2025-12-04T10:11:57.7147449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7147508Z     with policy():
2025-12-04T10:11:57.7147801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7147866Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7148686Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7148692Z 
2025-12-04T10:11:57.7148818Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7149341Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7149344Z 
2025-12-04T10:11:57.7149498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7149621Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7149718Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7150062Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7150188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7150248Z graph_break []
2025-12-04T10:11:57.7150408Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7151106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7151173Z   if out == self.unknown_value:
2025-12-04T10:11:57.7151294Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7151386Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7151510Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7151851Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7151909Z graph_break []
2025-12-04T10:11:57.7151992Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7152298Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7152370Z Traceback (most recent call last):
2025-12-04T10:11:57.7152665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7152730Z     method(*args, **kwargs)
2025-12-04T10:11:57.7153017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7153082Z     method(*args, **kwargs)
2025-12-04T10:11:57.7153433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7153493Z     with policy():
2025-12-04T10:11:57.7153791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7153893Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7154714Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7154718Z 
2025-12-04T10:11:57.7154843Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7155364Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7155371Z 
2025-12-04T10:11:57.7155527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7155650Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7155745Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7156085Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7156208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7156271Z graph_break []
2025-12-04T10:11:57.7156393Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7157085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7157153Z   if out == self.unknown_value:
2025-12-04T10:11:57.7157274Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7157421Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7157543Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7157885Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7157944Z graph_break []
2025-12-04T10:11:57.7158065Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7158157Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7158276Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7158617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7158678Z graph_break []
2025-12-04T10:11:57.7159164Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml -
2025-12-04T10:11:57.7159270Z =========================== short test summary info ============================
2025-12-04T10:11:57.7160671Z FAILED [0.4506s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7160676Z 
2025-12-04T10:11:57.7160810Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7161330Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7161392Z 
2025-12-04T10:11:57.7161551Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7161671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7161791Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.7161852Z Got exit code 1
2025-12-04T10:11:57.7162332Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7162575Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7162842Z W1204 09:49:23.039000 50550 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7163230Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml
2025-12-04T10:11:57.7163329Z ============================= test session starts ==============================
2025-12-04T10:11:57.7163536Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7163602Z cachedir: .pytest_cache
2025-12-04T10:11:57.7163905Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7163984Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7164051Z configfile: pytest.ini
2025-12-04T10:11:57.7164364Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7164490Z collecting ... collected 58 items / 29 deselected / 29 selected
2025-12-04T10:11:57.7164620Z stepcurrent: skipping 29 already run items.
2025-12-04T10:11:57.7164689Z Running 29 items in this shard
2025-12-04T10:11:57.7164693Z 
2025-12-04T10:11:57.7165188Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8542s] [  3%]
2025-12-04T10:11:57.7165672Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4390s] [  3%]
2025-12-04T10:11:57.7166119Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4334s] [  3%]
2025-12-04T10:11:57.7166123Z 
2025-12-04T10:11:57.7166207Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7166493Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7166569Z Traceback (most recent call last):
2025-12-04T10:11:57.7166875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7166939Z     method(*args, **kwargs)
2025-12-04T10:11:57.7167235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7167297Z     method(*args, **kwargs)
2025-12-04T10:11:57.7167650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7167716Z     with policy():
2025-12-04T10:11:57.7168011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7168079Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7168905Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7168911Z 
2025-12-04T10:11:57.7169036Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7169555Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7169559Z 
2025-12-04T10:11:57.7169728Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7169861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7169953Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7170303Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7170433Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7170491Z graph_break []
2025-12-04T10:11:57.7170779Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7170852Z Traceback (most recent call last):
2025-12-04T10:11:57.7171144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7171210Z     method(*args, **kwargs)
2025-12-04T10:11:57.7171499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7171566Z     method(*args, **kwargs)
2025-12-04T10:11:57.7171856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7171952Z     with policy():
2025-12-04T10:11:57.7172253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7172319Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7173120Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7173129Z 
2025-12-04T10:11:57.7173254Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7173766Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7173773Z 
2025-12-04T10:11:57.7173928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7174052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7174147Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7174486Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7174612Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7174673Z graph_break []
2025-12-04T10:11:57.7174942Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7175032Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7175157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7175542Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7175604Z graph_break []
2025-12-04T10:11:57.7175688Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7175970Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7176045Z Traceback (most recent call last):
2025-12-04T10:11:57.7176338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7176404Z     method(*args, **kwargs)
2025-12-04T10:11:57.7176696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7176757Z     method(*args, **kwargs)
2025-12-04T10:11:57.7177050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7177111Z     with policy():
2025-12-04T10:11:57.7177400Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7177469Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7178275Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7178279Z 
2025-12-04T10:11:57.7178405Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7178918Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7178959Z 
2025-12-04T10:11:57.7179116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7179250Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7179342Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7179688Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7179810Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7179870Z graph_break []
2025-12-04T10:11:57.7179998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7180083Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7180205Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7180545Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7180601Z graph_break []
2025-12-04T10:11:57.7180726Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7180818Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7180946Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7181358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7181419Z graph_break []
2025-12-04T10:11:57.7181906Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml -
2025-12-04T10:11:57.7182056Z =========================== short test summary info ============================
2025-12-04T10:11:57.7183335Z FAILED [0.4334s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7183340Z 
2025-12-04T10:11:57.7183471Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7183991Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7184001Z 
2025-12-04T10:11:57.7184160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7184267Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7184389Z ================== 1 failed, 29 deselected, 2 rerun in 2.75s ===================
2025-12-04T10:11:57.7184447Z Got exit code 1
2025-12-04T10:11:57.7184512Z Retrying single test...
2025-12-04T10:11:57.7184779Z W1204 09:49:32.687000 50731 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7185175Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml
2025-12-04T10:11:57.7185278Z ============================= test session starts ==============================
2025-12-04T10:11:57.7185488Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7185551Z cachedir: .pytest_cache
2025-12-04T10:11:57.7185905Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7185981Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7186047Z configfile: pytest.ini
2025-12-04T10:11:57.7186367Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7186494Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7187070Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7187142Z Running 1 items in this shard
2025-12-04T10:11:57.7187145Z 
2025-12-04T10:11:57.7187867Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:33.729351825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7187878Z 
2025-12-04T10:11:57.7188180Z [W1204 09:49:42.741671494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7188183Z 
2025-12-04T10:11:57.7188473Z [W1204 09:49:42.741926409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7188479Z 
2025-12-04T10:11:57.7188847Z [W1204 09:49:42.747831789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7188851Z 
2025-12-04T10:11:57.7189140Z [W1204 09:49:42.748423620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7189177Z 
2025-12-04T10:11:57.7189469Z [W1204 09:49:42.748590933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7189474Z 
2025-12-04T10:11:57.7189758Z [W1204 09:49:42.754175029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7189761Z 
2025-12-04T10:11:57.7190050Z [W1204 09:49:42.754712308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7190053Z 
2025-12-04T10:11:57.7190345Z [W1204 09:49:42.754880291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7190348Z 
2025-12-04T10:11:57.7190432Z ('RERUN', {'yellow': True}) [10.8619s] [100%]
2025-12-04T10:11:57.7191153Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:43.929774104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7191160Z 
2025-12-04T10:11:57.7191447Z [W1204 09:49:43.930352243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7191454Z 
2025-12-04T10:11:57.7191740Z [W1204 09:49:43.930496516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7191743Z 
2025-12-04T10:11:57.7192029Z [W1204 09:49:43.933387725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7192032Z 
2025-12-04T10:11:57.7192329Z [W1204 09:49:43.933942555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7192332Z 
2025-12-04T10:11:57.7192622Z [W1204 09:49:43.934081517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7192661Z 
2025-12-04T10:11:57.7192950Z [W1204 09:49:43.938537663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7192953Z 
2025-12-04T10:11:57.7193237Z [W1204 09:49:43.938993931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7193241Z 
2025-12-04T10:11:57.7193527Z [W1204 09:49:43.939129893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7193533Z 
2025-12-04T10:11:57.7193612Z ('RERUN', {'yellow': True}) [0.4178s] [100%]
2025-12-04T10:11:57.7194331Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:44.346606967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7194337Z 
2025-12-04T10:11:57.7194626Z [W1204 09:49:44.347157466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7194629Z 
2025-12-04T10:11:57.7194914Z [W1204 09:49:44.347295879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7194924Z 
2025-12-04T10:11:57.7195213Z [W1204 09:49:44.350202289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7195282Z 
2025-12-04T10:11:57.7195569Z [W1204 09:49:44.350759028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7195572Z 
2025-12-04T10:11:57.7195864Z [W1204 09:49:44.350896171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7195901Z 
2025-12-04T10:11:57.7196186Z [W1204 09:49:44.355343777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7196189Z 
2025-12-04T10:11:57.7196480Z [W1204 09:49:44.355799335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7196483Z 
2025-12-04T10:11:57.7196770Z [W1204 09:49:44.355935398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7196773Z 
2025-12-04T10:11:57.7196840Z FAILED [0.4150s] [100%]
2025-12-04T10:11:57.7196843Z 
2025-12-04T10:11:57.7196926Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7197216Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7197297Z Traceback (most recent call last):
2025-12-04T10:11:57.7197608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7197675Z     method(*args, **kwargs)
2025-12-04T10:11:57.7197967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7198031Z     method(*args, **kwargs)
2025-12-04T10:11:57.7198333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7198399Z     with policy():
2025-12-04T10:11:57.7198696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7198764Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7199556Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7199599Z 
2025-12-04T10:11:57.7199736Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7200292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7200296Z 
2025-12-04T10:11:57.7200461Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7200588Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7200683Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7201035Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7201166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7201224Z graph_break []
2025-12-04T10:11:57.7201352Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7202041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7202186Z   if out == self.unknown_value:
2025-12-04T10:11:57.7202479Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7202552Z Traceback (most recent call last):
2025-12-04T10:11:57.7202853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7202954Z     method(*args, **kwargs)
2025-12-04T10:11:57.7203250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7203318Z     method(*args, **kwargs)
2025-12-04T10:11:57.7203613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7203676Z     with policy():
2025-12-04T10:11:57.7203972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7204040Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7204846Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7204853Z 
2025-12-04T10:11:57.7204977Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7205495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7205499Z 
2025-12-04T10:11:57.7205663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7205794Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7205888Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7206235Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7206368Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7206465Z graph_break []
2025-12-04T10:11:57.7206589Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7207287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7207359Z   if out == self.unknown_value:
2025-12-04T10:11:57.7207483Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7207575Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7207697Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7208041Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7208102Z graph_break []
2025-12-04T10:11:57.7208189Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7208476Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7208548Z Traceback (most recent call last):
2025-12-04T10:11:57.7208847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7208910Z     method(*args, **kwargs)
2025-12-04T10:11:57.7209267Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7209330Z     method(*args, **kwargs)
2025-12-04T10:11:57.7209628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7209694Z     with policy():
2025-12-04T10:11:57.7210024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7210089Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7210893Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7210898Z 
2025-12-04T10:11:57.7211019Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7211536Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7211539Z 
2025-12-04T10:11:57.7211693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7211822Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7211912Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7212254Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7212379Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7212437Z graph_break []
2025-12-04T10:11:57.7212559Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7213250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7213318Z   if out == self.unknown_value:
2025-12-04T10:11:57.7213493Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7213587Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7213710Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7214053Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7214110Z graph_break []
2025-12-04T10:11:57.7214232Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7214320Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7214443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7214780Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7214838Z graph_break []
2025-12-04T10:11:57.7215324Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml -
2025-12-04T10:11:57.7215425Z =========================== short test summary info ============================
2025-12-04T10:11:57.7216786Z FAILED [0.4150s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7216794Z 
2025-12-04T10:11:57.7216937Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7217846Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7217851Z 
2025-12-04T10:11:57.7218020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7218130Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7218247Z ================== 1 failed, 57 deselected, 2 rerun in 11.72s ==================
2025-12-04T10:11:57.7218308Z Got exit code 1
2025-12-04T10:11:57.7218373Z Retrying single test...
2025-12-04T10:11:57.7218646Z W1204 09:49:51.046000 50917 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7219033Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml
2025-12-04T10:11:57.7219138Z ============================= test session starts ==============================
2025-12-04T10:11:57.7219357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7219422Z cachedir: .pytest_cache
2025-12-04T10:11:57.7219732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7219810Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7219873Z configfile: pytest.ini
2025-12-04T10:11:57.7220192Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7220326Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7220902Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7221051Z Running 1 items in this shard
2025-12-04T10:11:57.7221055Z 
2025-12-04T10:11:57.7221781Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:49:52.102377034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7221786Z 
2025-12-04T10:11:57.7222086Z [W1204 09:50:01.190690714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7222090Z 
2025-12-04T10:11:57.7222386Z [W1204 09:50:01.190949308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7222389Z 
2025-12-04T10:11:57.7222679Z [W1204 09:50:01.196751798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7222685Z 
2025-12-04T10:11:57.7222980Z [W1204 09:50:01.197337837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7222983Z 
2025-12-04T10:11:57.7223276Z [W1204 09:50:01.197507511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7223279Z 
2025-12-04T10:11:57.7223564Z [W1204 09:50:01.202947574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7223567Z 
2025-12-04T10:11:57.7223949Z [W1204 09:50:01.203489293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7223953Z 
2025-12-04T10:11:57.7224241Z [W1204 09:50:01.203652446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7224289Z 
2025-12-04T10:11:57.7224373Z ('RERUN', {'yellow': True}) [10.9501s] [100%]
2025-12-04T10:11:57.7225096Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:50:02.375854989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7225099Z 
2025-12-04T10:11:57.7225387Z [W1204 09:50:02.376432368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7225391Z 
2025-12-04T10:11:57.7225681Z [W1204 09:50:02.376577431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7225684Z 
2025-12-04T10:11:57.7225970Z [W1204 09:50:02.379478830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7225977Z 
2025-12-04T10:11:57.7226264Z [W1204 09:50:02.380052110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7226268Z 
2025-12-04T10:11:57.7226556Z [W1204 09:50:02.380194963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7226559Z 
2025-12-04T10:11:57.7226846Z [W1204 09:50:02.384668970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7226850Z 
2025-12-04T10:11:57.7227135Z [W1204 09:50:02.385126448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7227138Z 
2025-12-04T10:11:57.7227426Z [W1204 09:50:02.385262060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7227434Z 
2025-12-04T10:11:57.7227549Z ('RERUN', {'yellow': True}) [0.4127s] [100%]
2025-12-04T10:11:57.7228258Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:50:02.785795632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7228262Z 
2025-12-04T10:11:57.7228564Z [W1204 09:50:02.786347632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7228568Z 
2025-12-04T10:11:57.7228858Z [W1204 09:50:02.786490804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7228861Z 
2025-12-04T10:11:57.7229151Z [W1204 09:50:02.789372683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7229155Z 
2025-12-04T10:11:57.7229445Z [W1204 09:50:02.789935613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7229448Z 
2025-12-04T10:11:57.7229736Z [W1204 09:50:02.790098846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7229739Z 
2025-12-04T10:11:57.7230025Z [W1204 09:50:02.794551662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7230028Z 
2025-12-04T10:11:57.7230384Z [W1204 09:50:02.795010100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7230387Z 
2025-12-04T10:11:57.7230675Z [W1204 09:50:02.795145413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7230711Z 
2025-12-04T10:11:57.7230773Z FAILED [0.4079s] [100%]
2025-12-04T10:11:57.7230778Z 
2025-12-04T10:11:57.7230868Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7231153Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7231230Z Traceback (most recent call last):
2025-12-04T10:11:57.7231537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7231606Z     method(*args, **kwargs)
2025-12-04T10:11:57.7231908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7231971Z     method(*args, **kwargs)
2025-12-04T10:11:57.7232254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7232319Z     with policy():
2025-12-04T10:11:57.7232614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7232685Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7233479Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7233484Z 
2025-12-04T10:11:57.7233621Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7234138Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7234142Z 
2025-12-04T10:11:57.7234298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7234468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7234565Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7234920Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7235047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7235106Z graph_break []
2025-12-04T10:11:57.7235234Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7235939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7236013Z   if out == self.unknown_value:
2025-12-04T10:11:57.7236307Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7236382Z Traceback (most recent call last):
2025-12-04T10:11:57.7236685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7236749Z     method(*args, **kwargs)
2025-12-04T10:11:57.7237041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7237107Z     method(*args, **kwargs)
2025-12-04T10:11:57.7237468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7237534Z     with policy():
2025-12-04T10:11:57.7237827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7237925Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7238740Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7238744Z 
2025-12-04T10:11:57.7238876Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7239397Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7239401Z 
2025-12-04T10:11:57.7239561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7239686Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7239785Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7240190Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7240323Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7240383Z graph_break []
2025-12-04T10:11:57.7240506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7241205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7241274Z   if out == self.unknown_value:
2025-12-04T10:11:57.7241397Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7241487Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7241672Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7242015Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7242073Z graph_break []
2025-12-04T10:11:57.7242156Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7242452Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7242524Z Traceback (most recent call last):
2025-12-04T10:11:57.7242829Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7242892Z     method(*args, **kwargs)
2025-12-04T10:11:57.7243183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7243253Z     method(*args, **kwargs)
2025-12-04T10:11:57.7243540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7243598Z     with policy():
2025-12-04T10:11:57.7243895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7243960Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7244841Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7244845Z 
2025-12-04T10:11:57.7244969Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7245518Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7245524Z 
2025-12-04T10:11:57.7245682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7245807Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7245899Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7246245Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7246377Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7246443Z graph_break []
2025-12-04T10:11:57.7246568Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7247259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7247330Z   if out == self.unknown_value:
2025-12-04T10:11:57.7247451Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7247544Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7247667Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7248016Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7248073Z graph_break []
2025-12-04T10:11:57.7248195Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7248285Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7248407Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7248790Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7248849Z graph_break []
2025-12-04T10:11:57.7249332Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml -
2025-12-04T10:11:57.7249437Z =========================== short test summary info ============================
2025-12-04T10:11:57.7250724Z FAILED [0.4079s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7250731Z 
2025-12-04T10:11:57.7250859Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7251371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7251375Z 
2025-12-04T10:11:57.7251532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7251704Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7251823Z ================== 1 failed, 57 deselected, 2 rerun in 11.79s ==================
2025-12-04T10:11:57.7251885Z Got exit code 1
2025-12-04T10:11:57.7252358Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7252642Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7252906Z W1204 09:50:09.437000 51103 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7253290Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml
2025-12-04T10:11:57.7253388Z ============================= test session starts ==============================
2025-12-04T10:11:57.7253601Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7253668Z cachedir: .pytest_cache
2025-12-04T10:11:57.7253979Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7254059Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7254127Z configfile: pytest.ini
2025-12-04T10:11:57.7254442Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7254570Z collecting ... collected 58 items / 30 deselected / 28 selected
2025-12-04T10:11:57.7254659Z stepcurrent: skipping 30 already run items.
2025-12-04T10:11:57.7254729Z Running 28 items in this shard
2025-12-04T10:11:57.7254732Z 
2025-12-04T10:11:57.7255230Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9267s] [  3%]
2025-12-04T10:11:57.7255722Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5301s] [  3%]
2025-12-04T10:11:57.7256163Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5241s] [  3%]
2025-12-04T10:11:57.7256204Z 
2025-12-04T10:11:57.7256291Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7256575Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7256650Z Traceback (most recent call last):
2025-12-04T10:11:57.7256958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7257024Z     method(*args, **kwargs)
2025-12-04T10:11:57.7257323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7257385Z     method(*args, **kwargs)
2025-12-04T10:11:57.7257682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7257743Z     with policy():
2025-12-04T10:11:57.7258038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7258110Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7258905Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7258975Z 
2025-12-04T10:11:57.7259110Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7259630Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7259669Z 
2025-12-04T10:11:57.7259828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7259958Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7260052Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7260604Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7260735Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7260792Z graph_break []
2025-12-04T10:11:57.7261088Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7261161Z Traceback (most recent call last):
2025-12-04T10:11:57.7261471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7261539Z     method(*args, **kwargs)
2025-12-04T10:11:57.7261835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7261901Z     method(*args, **kwargs)
2025-12-04T10:11:57.7262189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7262247Z     with policy():
2025-12-04T10:11:57.7262549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7262615Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7263426Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7263468Z 
2025-12-04T10:11:57.7263593Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7264105Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7264111Z 
2025-12-04T10:11:57.7264268Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7264397Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7264492Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7265037Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7265168Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7265226Z graph_break []
2025-12-04T10:11:57.7265358Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7265451Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7265574Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7266181Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7266243Z graph_break []
2025-12-04T10:11:57.7266325Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7266614Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7266738Z Traceback (most recent call last):
2025-12-04T10:11:57.7267039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7267105Z     method(*args, **kwargs)
2025-12-04T10:11:57.7267408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7267471Z     method(*args, **kwargs)
2025-12-04T10:11:57.7267767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7267829Z     with policy():
2025-12-04T10:11:57.7268127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7268191Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7269001Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7269010Z 
2025-12-04T10:11:57.7269134Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7269651Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7269655Z 
2025-12-04T10:11:57.7269817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7269942Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7270033Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7270579Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7270751Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7270816Z graph_break []
2025-12-04T10:11:57.7270938Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7271027Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7271152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7271694Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7271754Z graph_break []
2025-12-04T10:11:57.7271876Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7271965Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7272088Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7272625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7272686Z graph_break []
2025-12-04T10:11:57.7273243Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml -
2025-12-04T10:11:57.7273344Z =========================== short test summary info ============================
2025-12-04T10:11:57.7274623Z FAILED [0.5241s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7274661Z 
2025-12-04T10:11:57.7274787Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7275312Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7275315Z 
2025-12-04T10:11:57.7275468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7275574Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7275697Z ================== 1 failed, 30 deselected, 2 rerun in 3.01s ===================
2025-12-04T10:11:57.7275755Z Got exit code 1
2025-12-04T10:11:57.7275823Z Retrying single test...
2025-12-04T10:11:57.7276083Z W1204 09:50:19.064000 51285 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7276469Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml
2025-12-04T10:11:57.7276567Z ============================= test session starts ==============================
2025-12-04T10:11:57.7276788Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7276859Z cachedir: .pytest_cache
2025-12-04T10:11:57.7277171Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7277248Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7277358Z configfile: pytest.ini
2025-12-04T10:11:57.7277672Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7277805Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7278369Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7278437Z Running 1 items in this shard
2025-12-04T10:11:57.7278441Z 
2025-12-04T10:11:57.7279175Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:20.662417415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7279181Z 
2025-12-04T10:11:57.7279478Z [W1204 09:50:29.674002233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7279482Z 
2025-12-04T10:11:57.7279778Z [W1204 09:50:29.674268687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7279781Z 
2025-12-04T10:11:57.7280107Z [W1204 09:50:29.680496844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7280110Z 
2025-12-04T10:11:57.7280486Z [W1204 09:50:29.681112204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7280490Z 
2025-12-04T10:11:57.7280779Z [W1204 09:50:29.681288057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7280815Z 
2025-12-04T10:11:57.7281106Z [W1204 09:50:29.686799522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7281111Z 
2025-12-04T10:11:57.7281413Z [W1204 09:50:29.687338041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7281417Z 
2025-12-04T10:11:57.7281704Z [W1204 09:50:29.687500894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7281712Z 
2025-12-04T10:11:57.7281793Z ('RERUN', {'yellow': True}) [10.9704s] [100%]
2025-12-04T10:11:57.7282517Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:30.492915707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7282523Z 
2025-12-04T10:11:57.7282816Z [W1204 09:50:30.493476687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7282821Z 
2025-12-04T10:11:57.7283107Z [W1204 09:50:30.493615639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7283111Z 
2025-12-04T10:11:57.7283401Z [W1204 09:50:30.496485908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7283404Z 
2025-12-04T10:11:57.7283691Z [W1204 09:50:30.496935696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7283694Z 
2025-12-04T10:11:57.7283984Z [W1204 09:50:30.497076709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7283987Z 
2025-12-04T10:11:57.7284272Z [W1204 09:50:30.501652387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7284391Z 
2025-12-04T10:11:57.7284687Z [W1204 09:50:30.502116105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7284692Z 
2025-12-04T10:11:57.7284984Z [W1204 09:50:30.502253307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7284989Z 
2025-12-04T10:11:57.7285068Z ('RERUN', {'yellow': True}) [0.5014s] [100%]
2025-12-04T10:11:57.7285789Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:31.992764154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7285793Z 
2025-12-04T10:11:57.7286084Z [W1204 09:50:31.993299073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7286089Z 
2025-12-04T10:11:57.7286379Z [W1204 09:50:31.993439555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7286382Z 
2025-12-04T10:11:57.7286669Z [W1204 09:50:31.996344355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7286672Z 
2025-12-04T10:11:57.7286959Z [W1204 09:50:31.996796433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7287029Z 
2025-12-04T10:11:57.7287320Z [W1204 09:50:31.996934205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7287323Z 
2025-12-04T10:11:57.7287611Z [W1204 09:50:31.001491483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7287648Z 
2025-12-04T10:11:57.7287939Z [W1204 09:50:31.001948341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7287942Z 
2025-12-04T10:11:57.7288228Z [W1204 09:50:31.002084503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7288235Z 
2025-12-04T10:11:57.7288294Z FAILED [0.4984s] [100%]
2025-12-04T10:11:57.7288298Z 
2025-12-04T10:11:57.7288384Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7288676Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7288749Z Traceback (most recent call last):
2025-12-04T10:11:57.7289055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7289127Z     method(*args, **kwargs)
2025-12-04T10:11:57.7289419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7289487Z     method(*args, **kwargs)
2025-12-04T10:11:57.7289784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7289844Z     with policy():
2025-12-04T10:11:57.7290142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7290210Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7291003Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7291046Z 
2025-12-04T10:11:57.7291175Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7291691Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7291698Z 
2025-12-04T10:11:57.7291854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7291986Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7292085Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7292630Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7292760Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7292824Z graph_break []
2025-12-04T10:11:57.7292948Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7293642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7293713Z   if out == self.unknown_value:
2025-12-04T10:11:57.7294092Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7294173Z Traceback (most recent call last):
2025-12-04T10:11:57.7294471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7294571Z     method(*args, **kwargs)
2025-12-04T10:11:57.7294868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7294932Z     method(*args, **kwargs)
2025-12-04T10:11:57.7295226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7295285Z     with policy():
2025-12-04T10:11:57.7295581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7295649Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7296446Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7296451Z 
2025-12-04T10:11:57.7296581Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7297092Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7297096Z 
2025-12-04T10:11:57.7297255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7297380Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7297472Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7298018Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7298143Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7298244Z graph_break []
2025-12-04T10:11:57.7298366Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7299053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7299126Z   if out == self.unknown_value:
2025-12-04T10:11:57.7299246Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7299336Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7299465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7300003Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7300067Z graph_break []
2025-12-04T10:11:57.7300150Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7300435Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7300510Z Traceback (most recent call last):
2025-12-04T10:11:57.7300811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7300878Z     method(*args, **kwargs)
2025-12-04T10:11:57.7301239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7301304Z     method(*args, **kwargs)
2025-12-04T10:11:57.7301596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7301688Z     with policy():
2025-12-04T10:11:57.7301985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7302063Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7302872Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7302877Z 
2025-12-04T10:11:57.7303062Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7303865Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7303871Z 
2025-12-04T10:11:57.7304042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7304176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7304270Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7304814Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7304944Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7305005Z graph_break []
2025-12-04T10:11:57.7305135Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7305829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7305976Z   if out == self.unknown_value:
2025-12-04T10:11:57.7306104Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7306197Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7306324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7306872Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7306934Z graph_break []
2025-12-04T10:11:57.7307059Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7307148Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7307270Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7307805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7307866Z graph_break []
2025-12-04T10:11:57.7308364Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml -
2025-12-04T10:11:57.7308468Z =========================== short test summary info ============================
2025-12-04T10:11:57.7309834Z FAILED [0.4984s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7309881Z 
2025-12-04T10:11:57.7310016Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7310540Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7310544Z 
2025-12-04T10:11:57.7310709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7310819Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7310936Z ================== 1 failed, 57 deselected, 2 rerun in 11.99s ==================
2025-12-04T10:11:57.7310997Z Got exit code 1
2025-12-04T10:11:57.7311065Z Retrying single test...
2025-12-04T10:11:57.7311330Z W1204 09:50:37.603000 51472 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7311718Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml
2025-12-04T10:11:57.7311813Z ============================= test session starts ==============================
2025-12-04T10:11:57.7312022Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7312092Z cachedir: .pytest_cache
2025-12-04T10:11:57.7312402Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7312481Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7312551Z configfile: pytest.ini
2025-12-04T10:11:57.7312866Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7313001Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7313609Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7313679Z Running 1 items in this shard
2025-12-04T10:11:57.7313683Z 
2025-12-04T10:11:57.7314412Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:39.196800294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7314419Z 
2025-12-04T10:11:57.7314719Z [W1204 09:50:48.201438486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7314723Z 
2025-12-04T10:11:57.7315017Z [W1204 09:50:48.201701410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7315024Z 
2025-12-04T10:11:57.7315314Z [W1204 09:50:48.207715623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7315317Z 
2025-12-04T10:11:57.7315611Z [W1204 09:50:48.208316733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7315614Z 
2025-12-04T10:11:57.7315901Z [W1204 09:50:48.208499646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7315904Z 
2025-12-04T10:11:57.7316259Z [W1204 09:50:48.213916809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7316263Z 
2025-12-04T10:11:57.7316550Z [W1204 09:50:48.214453989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7316588Z 
2025-12-04T10:11:57.7316884Z [W1204 09:50:48.214613381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7316888Z 
2025-12-04T10:11:57.7316969Z ('RERUN', {'yellow': True}) [10.9480s] [100%]
2025-12-04T10:11:57.7317827Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:49.007377873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7317832Z 
2025-12-04T10:11:57.7318138Z [W1204 09:50:49.007901652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7318142Z 
2025-12-04T10:11:57.7318430Z [W1204 09:50:49.008040845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7318437Z 
2025-12-04T10:11:57.7318730Z [W1204 09:50:49.010930274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7318734Z 
2025-12-04T10:11:57.7319022Z [W1204 09:50:49.011378841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7319025Z 
2025-12-04T10:11:57.7319316Z [W1204 09:50:49.011515724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7319320Z 
2025-12-04T10:11:57.7319610Z [W1204 09:50:49.015890909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7319613Z 
2025-12-04T10:11:57.7319972Z [W1204 09:50:49.016341796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7320052Z 
2025-12-04T10:11:57.7320359Z [W1204 09:50:49.016478519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7320363Z 
2025-12-04T10:11:57.7320444Z ('RERUN', {'yellow': True}) [0.4940s] [100%]
2025-12-04T10:11:57.7321179Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:50:49.500062184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7321183Z 
2025-12-04T10:11:57.7321478Z [W1204 09:50:49.500628053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7321482Z 
2025-12-04T10:11:57.7321790Z [W1204 09:50:49.500773166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7321797Z 
2025-12-04T10:11:57.7322086Z [W1204 09:50:49.503631425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7322089Z 
2025-12-04T10:11:57.7322384Z [W1204 09:50:49.504083113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7322387Z 
2025-12-04T10:11:57.7322674Z [W1204 09:50:49.504222095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7322677Z 
2025-12-04T10:11:57.7323084Z [W1204 09:50:49.508700612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7323088Z 
2025-12-04T10:11:57.7323382Z [W1204 09:50:49.509156770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7323431Z 
2025-12-04T10:11:57.7323728Z [W1204 09:50:49.509297213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7323731Z 
2025-12-04T10:11:57.7323793Z FAILED [0.4881s] [100%]
2025-12-04T10:11:57.7323796Z 
2025-12-04T10:11:57.7323880Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7324181Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7324260Z Traceback (most recent call last):
2025-12-04T10:11:57.7328875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7328979Z     method(*args, **kwargs)
2025-12-04T10:11:57.7329325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7329402Z     method(*args, **kwargs)
2025-12-04T10:11:57.7329728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7329796Z     with policy():
2025-12-04T10:11:57.7330103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7330171Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7330988Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7330993Z 
2025-12-04T10:11:57.7331131Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7331659Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7331736Z 
2025-12-04T10:11:57.7331904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7332043Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7332140Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7332698Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7332835Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7332893Z graph_break []
2025-12-04T10:11:57.7333022Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7333730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7333804Z   if out == self.unknown_value:
2025-12-04T10:11:57.7334105Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7334193Z Traceback (most recent call last):
2025-12-04T10:11:57.7334513Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7334658Z     method(*args, **kwargs)
2025-12-04T10:11:57.7334953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7335017Z     method(*args, **kwargs)
2025-12-04T10:11:57.7335340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7335406Z     with policy():
2025-12-04T10:11:57.7335715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7335784Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7336596Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7336604Z 
2025-12-04T10:11:57.7336735Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7337252Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7337262Z 
2025-12-04T10:11:57.7337424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7337553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7337650Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7338203Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7338333Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7338396Z graph_break []
2025-12-04T10:11:57.7338523Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7339216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7339327Z   if out == self.unknown_value:
2025-12-04T10:11:57.7339450Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7339546Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7339666Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7340208Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7340267Z graph_break []
2025-12-04T10:11:57.7340349Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7340641Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7340728Z Traceback (most recent call last):
2025-12-04T10:11:57.7341034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7341101Z     method(*args, **kwargs)
2025-12-04T10:11:57.7341391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7341459Z     method(*args, **kwargs)
2025-12-04T10:11:57.7341749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7341876Z     with policy():
2025-12-04T10:11:57.7342179Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7342245Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7343092Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7343097Z 
2025-12-04T10:11:57.7343226Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7343744Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7343752Z 
2025-12-04T10:11:57.7343913Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7344040Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7344134Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7344680Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7344806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7344867Z graph_break []
2025-12-04T10:11:57.7344988Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7345677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7345745Z   if out == self.unknown_value:
2025-12-04T10:11:57.7345867Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7345959Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7346128Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7346669Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7346730Z graph_break []
2025-12-04T10:11:57.7346851Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7346942Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7347060Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7347599Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7347657Z graph_break []
2025-12-04T10:11:57.7348155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml -
2025-12-04T10:11:57.7348260Z =========================== short test summary info ============================
2025-12-04T10:11:57.7349606Z FAILED [0.4881s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7349612Z 
2025-12-04T10:11:57.7349739Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7350286Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7350292Z 
2025-12-04T10:11:57.7350450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7350554Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7350670Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.7350731Z Got exit code 1
2025-12-04T10:11:57.7351200Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7351446Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7351711Z W1204 09:50:56.102000 51658 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7352098Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml
2025-12-04T10:11:57.7352199Z ============================= test session starts ==============================
2025-12-04T10:11:57.7352406Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7352472Z cachedir: .pytest_cache
2025-12-04T10:11:57.7352782Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7352861Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7352930Z configfile: pytest.ini
2025-12-04T10:11:57.7353241Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7353370Z collecting ... collected 58 items / 31 deselected / 27 selected
2025-12-04T10:11:57.7353503Z stepcurrent: skipping 31 already run items.
2025-12-04T10:11:57.7353573Z Running 27 items in this shard
2025-12-04T10:11:57.7353577Z 
2025-12-04T10:11:57.7354086Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0122s] [  3%]
2025-12-04T10:11:57.7354575Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6090s] [  3%]
2025-12-04T10:11:57.7355019Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6100s] [  3%]
2025-12-04T10:11:57.7355023Z 
2025-12-04T10:11:57.7355108Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7355403Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7355478Z Traceback (most recent call last):
2025-12-04T10:11:57.7355790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7355855Z     method(*args, **kwargs)
2025-12-04T10:11:57.7356159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7356222Z     method(*args, **kwargs)
2025-12-04T10:11:57.7356602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7356662Z     with policy():
2025-12-04T10:11:57.7356954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7357058Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7357863Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7357867Z 
2025-12-04T10:11:57.7358001Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7358528Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7358532Z 
2025-12-04T10:11:57.7358690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7358824Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7358920Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7359277Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7359405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7359463Z graph_break []
2025-12-04T10:11:57.7359755Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7359827Z Traceback (most recent call last):
2025-12-04T10:11:57.7360193Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7360260Z     method(*args, **kwargs)
2025-12-04T10:11:57.7360548Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7360614Z     method(*args, **kwargs)
2025-12-04T10:11:57.7360958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7361017Z     with policy():
2025-12-04T10:11:57.7361310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7361375Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7362202Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7362206Z 
2025-12-04T10:11:57.7362329Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7362848Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7362858Z 
2025-12-04T10:11:57.7363012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7363137Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7363233Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7363583Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7363775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7363840Z graph_break []
2025-12-04T10:11:57.7363961Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7364052Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7364208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7364558Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7364620Z graph_break []
2025-12-04T10:11:57.7364704Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7364995Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7365070Z Traceback (most recent call last):
2025-12-04T10:11:57.7365371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7365436Z     method(*args, **kwargs)
2025-12-04T10:11:57.7365726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7365791Z     method(*args, **kwargs)
2025-12-04T10:11:57.7366081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7366140Z     with policy():
2025-12-04T10:11:57.7366429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7366499Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7367325Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7367329Z 
2025-12-04T10:11:57.7367453Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7367972Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7368013Z 
2025-12-04T10:11:57.7368169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7368291Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7368386Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7368730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7368854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7368912Z graph_break []
2025-12-04T10:11:57.7369031Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7369117Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7369241Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7369590Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7369648Z graph_break []
2025-12-04T10:11:57.7369773Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7369859Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7369982Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7370384Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7370443Z graph_break []
2025-12-04T10:11:57.7370934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml -
2025-12-04T10:11:57.7371068Z =========================== short test summary info ============================
2025-12-04T10:11:57.7372373Z FAILED [0.6100s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7372380Z 
2025-12-04T10:11:57.7372502Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7373018Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7373025Z 
2025-12-04T10:11:57.7373179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7373282Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7373401Z ================== 1 failed, 31 deselected, 2 rerun in 3.26s ===================
2025-12-04T10:11:57.7373458Z Got exit code 1
2025-12-04T10:11:57.7373522Z Retrying single test...
2025-12-04T10:11:57.7373791Z W1204 09:51:05.938000 51847 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7374181Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml
2025-12-04T10:11:57.7374279Z ============================= test session starts ==============================
2025-12-04T10:11:57.7374483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7374598Z cachedir: .pytest_cache
2025-12-04T10:11:57.7374909Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7374985Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7375052Z configfile: pytest.ini
2025-12-04T10:11:57.7375364Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7375490Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7376063Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7376136Z Running 1 items in this shard
2025-12-04T10:11:57.7376139Z 
2025-12-04T10:11:57.7376877Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:07.158767824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7376883Z 
2025-12-04T10:11:57.7377179Z [W1204 09:51:16.355214618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7377183Z 
2025-12-04T10:11:57.7377479Z [W1204 09:51:16.355463442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7377554Z 
2025-12-04T10:11:57.7377842Z [W1204 09:51:16.361361582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7377845Z 
2025-12-04T10:11:57.7378134Z [W1204 09:51:16.361930962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7378176Z 
2025-12-04T10:11:57.7378474Z [W1204 09:51:16.362097215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7378478Z 
2025-12-04T10:11:57.7378764Z [W1204 09:51:16.367490917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7378768Z 
2025-12-04T10:11:57.7379058Z [W1204 09:51:16.368040227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7379062Z 
2025-12-04T10:11:57.7379350Z [W1204 09:51:16.368212370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7379354Z 
2025-12-04T10:11:57.7379440Z ('RERUN', {'yellow': True}) [11.2211s] [100%]
2025-12-04T10:11:57.7380167Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:17.705141106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7380172Z 
2025-12-04T10:11:57.7380462Z [W1204 09:51:17.705657125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7380466Z 
2025-12-04T10:11:57.7380752Z [W1204 09:51:17.705793807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7380755Z 
2025-12-04T10:11:57.7381049Z [W1204 09:51:17.708647926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7381052Z 
2025-12-04T10:11:57.7381339Z [W1204 09:51:17.709191725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7381380Z 
2025-12-04T10:11:57.7381670Z [W1204 09:51:17.709329258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7381673Z 
2025-12-04T10:11:57.7381967Z [W1204 09:51:17.713756273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7381970Z 
2025-12-04T10:11:57.7382255Z [W1204 09:51:17.714209641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7382258Z 
2025-12-04T10:11:57.7382548Z [W1204 09:51:17.714350903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7382551Z 
2025-12-04T10:11:57.7382629Z ('RERUN', {'yellow': True}) [0.5823s] [100%]
2025-12-04T10:11:57.7383359Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:18.285017860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7383369Z 
2025-12-04T10:11:57.7383661Z [W1204 09:51:18.285530789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7383664Z 
2025-12-04T10:11:57.7383953Z [W1204 09:51:18.285669752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7383956Z 
2025-12-04T10:11:57.7384324Z [W1204 09:51:18.288483690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7384328Z 
2025-12-04T10:11:57.7384615Z [W1204 09:51:18.289018100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7384655Z 
2025-12-04T10:11:57.7384944Z [W1204 09:51:18.289159312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7384947Z 
2025-12-04T10:11:57.7385233Z [W1204 09:51:18.293565987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7385236Z 
2025-12-04T10:11:57.7385524Z [W1204 09:51:18.294018995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7385528Z 
2025-12-04T10:11:57.7385817Z [W1204 09:51:18.294159357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7385820Z 
2025-12-04T10:11:57.7385882Z FAILED [0.5787s] [100%]
2025-12-04T10:11:57.7385885Z 
2025-12-04T10:11:57.7385965Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7386267Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7386348Z Traceback (most recent call last):
2025-12-04T10:11:57.7386655Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7386723Z     method(*args, **kwargs)
2025-12-04T10:11:57.7387016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7387080Z     method(*args, **kwargs)
2025-12-04T10:11:57.7387372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7387430Z     with policy():
2025-12-04T10:11:57.7387729Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7387832Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7388639Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7388643Z 
2025-12-04T10:11:57.7388773Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7389296Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7389300Z 
2025-12-04T10:11:57.7389458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7389582Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7389678Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7390028Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7390152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7390210Z graph_break []
2025-12-04T10:11:57.7390331Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7391098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7391176Z   if out == self.unknown_value:
2025-12-04T10:11:57.7391470Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7391580Z Traceback (most recent call last):
2025-12-04T10:11:57.7391884Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7391949Z     method(*args, **kwargs)
2025-12-04T10:11:57.7392245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7392307Z     method(*args, **kwargs)
2025-12-04T10:11:57.7392595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7392658Z     with policy():
2025-12-04T10:11:57.7392955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7393022Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7393847Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7393853Z 
2025-12-04T10:11:57.7393984Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7394506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7394510Z 
2025-12-04T10:11:57.7394671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7394804Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7394898Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7395247Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7395413Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7395473Z graph_break []
2025-12-04T10:11:57.7395598Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7396287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7396356Z   if out == self.unknown_value:
2025-12-04T10:11:57.7396496Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7396587Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7396714Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7397059Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7397117Z graph_break []
2025-12-04T10:11:57.7397203Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7397494Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7397571Z Traceback (most recent call last):
2025-12-04T10:11:57.7397868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7398075Z     method(*args, **kwargs)
2025-12-04T10:11:57.7398371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7398432Z     method(*args, **kwargs)
2025-12-04T10:11:57.7398755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7398819Z     with policy():
2025-12-04T10:11:57.7399108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7399176Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7400039Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7400044Z 
2025-12-04T10:11:57.7400173Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7400694Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7400701Z 
2025-12-04T10:11:57.7400857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7400981Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7401072Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7401414Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7401540Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7401602Z graph_break []
2025-12-04T10:11:57.7401726Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7402413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7402521Z   if out == self.unknown_value:
2025-12-04T10:11:57.7402647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7402733Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7402859Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7403211Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7403268Z graph_break []
2025-12-04T10:11:57.7403395Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7403483Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7403603Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7403943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7404010Z graph_break []
2025-12-04T10:11:57.7404502Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml -
2025-12-04T10:11:57.7404603Z =========================== short test summary info ============================
2025-12-04T10:11:57.7405975Z FAILED [0.5787s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7406013Z 
2025-12-04T10:11:57.7406138Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7406655Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7406663Z 
2025-12-04T10:11:57.7406818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7406924Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7407045Z ================== 1 failed, 57 deselected, 2 rerun in 12.41s ==================
2025-12-04T10:11:57.7407102Z Got exit code 1
2025-12-04T10:11:57.7407166Z Retrying single test...
2025-12-04T10:11:57.7407432Z W1204 09:51:24.943000 52041 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7407820Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml
2025-12-04T10:11:57.7407918Z ============================= test session starts ==============================
2025-12-04T10:11:57.7408125Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7408188Z cachedir: .pytest_cache
2025-12-04T10:11:57.7408501Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7408574Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7408640Z configfile: pytest.ini
2025-12-04T10:11:57.7408957Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7409087Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7409661Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7409787Z Running 1 items in this shard
2025-12-04T10:11:57.7409791Z 
2025-12-04T10:11:57.7410524Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:26.185574450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7410532Z 
2025-12-04T10:11:57.7410829Z [W1204 09:51:35.118662316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7410832Z 
2025-12-04T10:11:57.7411122Z [W1204 09:51:35.118925991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7411127Z 
2025-12-04T10:11:57.7411417Z [W1204 09:51:35.124850582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7411420Z 
2025-12-04T10:11:57.7411705Z [W1204 09:51:35.125451013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7411708Z 
2025-12-04T10:11:57.7411997Z [W1204 09:51:35.125622185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7412001Z 
2025-12-04T10:11:57.7412350Z [W1204 09:51:35.131093849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7412353Z 
2025-12-04T10:11:57.7412641Z [W1204 09:51:35.131660768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7412678Z 
2025-12-04T10:11:57.7412963Z [W1204 09:51:35.131835251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7412966Z 
2025-12-04T10:11:57.7413050Z ('RERUN', {'yellow': True}) [10.9828s] [100%]
2025-12-04T10:11:57.7413771Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:36.482674704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7413775Z 
2025-12-04T10:11:57.7414069Z [W1204 09:51:36.483206613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7414078Z 
2025-12-04T10:11:57.7414363Z [W1204 09:51:36.483346745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7414369Z 
2025-12-04T10:11:57.7414653Z [W1204 09:51:36.486285115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7414658Z 
2025-12-04T10:11:57.7414962Z [W1204 09:51:36.486847955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7414965Z 
2025-12-04T10:11:57.7415250Z [W1204 09:51:36.486987718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7415253Z 
2025-12-04T10:11:57.7415543Z [W1204 09:51:36.491620847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7415546Z 
2025-12-04T10:11:57.7415832Z [W1204 09:51:36.492083745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7415837Z 
2025-12-04T10:11:57.7416164Z [W1204 09:51:36.492221447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7416167Z 
2025-12-04T10:11:57.7416246Z ('RERUN', {'yellow': True}) [0.5971s] [100%]
2025-12-04T10:11:57.7416968Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:51:37.077498584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7416972Z 
2025-12-04T10:11:57.7417453Z [W1204 09:51:37.078036213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7417457Z 
2025-12-04T10:11:57.7417746Z [W1204 09:51:37.078175936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7417753Z 
2025-12-04T10:11:57.7418048Z [W1204 09:51:37.081150827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7418050Z 
2025-12-04T10:11:57.7418335Z [W1204 09:51:37.081715466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7418338Z 
2025-12-04T10:11:57.7418626Z [W1204 09:51:37.081853999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7418629Z 
2025-12-04T10:11:57.7418988Z [W1204 09:51:37.086418527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7418992Z 
2025-12-04T10:11:57.7419283Z [W1204 09:51:37.086877484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7419319Z 
2025-12-04T10:11:57.7419607Z [W1204 09:51:37.087014867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7419611Z 
2025-12-04T10:11:57.7419672Z FAILED [0.5975s] [100%]
2025-12-04T10:11:57.7419676Z 
2025-12-04T10:11:57.7419762Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7420056Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7420135Z Traceback (most recent call last):
2025-12-04T10:11:57.7420442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7420506Z     method(*args, **kwargs)
2025-12-04T10:11:57.7420807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7420868Z     method(*args, **kwargs)
2025-12-04T10:11:57.7421161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7421221Z     with policy():
2025-12-04T10:11:57.7421515Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7421582Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7422391Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7422396Z 
2025-12-04T10:11:57.7422526Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7423046Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7423089Z 
2025-12-04T10:11:57.7423247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7423376Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7423470Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7423821Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7423951Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7424008Z graph_break []
2025-12-04T10:11:57.7424136Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7424832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7424911Z   if out == self.unknown_value:
2025-12-04T10:11:57.7425203Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7425275Z Traceback (most recent call last):
2025-12-04T10:11:57.7425573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7425635Z     method(*args, **kwargs)
2025-12-04T10:11:57.7425994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7426059Z     method(*args, **kwargs)
2025-12-04T10:11:57.7426354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7426449Z     with policy():
2025-12-04T10:11:57.7426744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7426808Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7427641Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7427645Z 
2025-12-04T10:11:57.7427772Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7428296Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7428301Z 
2025-12-04T10:11:57.7428456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7428582Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7428679Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7429039Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7429169Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7429227Z graph_break []
2025-12-04T10:11:57.7429346Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7430040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7430109Z   if out == self.unknown_value:
2025-12-04T10:11:57.7430276Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7430366Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7430488Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7430835Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7430893Z graph_break []
2025-12-04T10:11:57.7430976Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7431271Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7431343Z Traceback (most recent call last):
2025-12-04T10:11:57.7431640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7431705Z     method(*args, **kwargs)
2025-12-04T10:11:57.7431996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7432063Z     method(*args, **kwargs)
2025-12-04T10:11:57.7432349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7432410Z     with policy():
2025-12-04T10:11:57.7432698Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7432762Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7433653Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7433710Z 
2025-12-04T10:11:57.7433835Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7434355Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7434359Z 
2025-12-04T10:11:57.7434513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7434633Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7434729Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7435070Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7435193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7435254Z graph_break []
2025-12-04T10:11:57.7435375Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7436059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7436127Z   if out == self.unknown_value:
2025-12-04T10:11:57.7436250Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7436339Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7436460Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7436801Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7436860Z graph_break []
2025-12-04T10:11:57.7437031Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7437125Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7437244Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7437584Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7437640Z graph_break []
2025-12-04T10:11:57.7438128Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml -
2025-12-04T10:11:57.7438232Z =========================== short test summary info ============================
2025-12-04T10:11:57.7439535Z FAILED [0.5975s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7439543Z 
2025-12-04T10:11:57.7439669Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7440292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7440297Z 
2025-12-04T10:11:57.7440456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7440561Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7440712Z ================== 1 failed, 57 deselected, 2 rerun in 12.20s ==================
2025-12-04T10:11:57.7440774Z Got exit code 1
2025-12-04T10:11:57.7441245Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7441498Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7441771Z W1204 09:51:43.709000 52235 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7442160Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml
2025-12-04T10:11:57.7442259Z ============================= test session starts ==============================
2025-12-04T10:11:57.7442464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7442535Z cachedir: .pytest_cache
2025-12-04T10:11:57.7442837Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7442914Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7442981Z configfile: pytest.ini
2025-12-04T10:11:57.7443293Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7443417Z collecting ... collected 58 items / 32 deselected / 26 selected
2025-12-04T10:11:57.7443507Z stepcurrent: skipping 32 already run items.
2025-12-04T10:11:57.7443578Z Running 26 items in this shard
2025-12-04T10:11:57.7443581Z 
2025-12-04T10:11:57.7444080Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8613s] [  3%]
2025-12-04T10:11:57.7444602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4565s] [  3%]
2025-12-04T10:11:57.7445040Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4424s] [  3%]
2025-12-04T10:11:57.7445047Z 
2025-12-04T10:11:57.7445130Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7445418Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7445492Z Traceback (most recent call last):
2025-12-04T10:11:57.7445796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7445861Z     method(*args, **kwargs)
2025-12-04T10:11:57.7446170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7446235Z     method(*args, **kwargs)
2025-12-04T10:11:57.7446530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7446587Z     with policy():
2025-12-04T10:11:57.7446881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7446949Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7447811Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7447848Z 
2025-12-04T10:11:57.7447977Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7448506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7448510Z 
2025-12-04T10:11:57.7448666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7448792Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7448883Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7449236Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7449359Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7449416Z graph_break []
2025-12-04T10:11:57.7449705Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7449778Z Traceback (most recent call last):
2025-12-04T10:11:57.7450072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7450134Z     method(*args, **kwargs)
2025-12-04T10:11:57.7450420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7450484Z     method(*args, **kwargs)
2025-12-04T10:11:57.7450774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7450831Z     with policy():
2025-12-04T10:11:57.7451124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7451188Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7451999Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7452041Z 
2025-12-04T10:11:57.7452167Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7452681Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7452691Z 
2025-12-04T10:11:57.7452843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7452965Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7453058Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7453404Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7453528Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7453589Z graph_break []
2025-12-04T10:11:57.7453710Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7453798Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7453918Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7454322Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7454384Z graph_break []
2025-12-04T10:11:57.7454466Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7454786Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7454859Z Traceback (most recent call last):
2025-12-04T10:11:57.7455151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7455221Z     method(*args, **kwargs)
2025-12-04T10:11:57.7455514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7455576Z     method(*args, **kwargs)
2025-12-04T10:11:57.7455868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7455926Z     with policy():
2025-12-04T10:11:57.7456220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7456284Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7457095Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7457100Z 
2025-12-04T10:11:57.7457223Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7457732Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7457739Z 
2025-12-04T10:11:57.7457905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7458026Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7458115Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7458497Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7458617Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7458677Z graph_break []
2025-12-04T10:11:57.7458798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7458889Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7459008Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7459352Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7459408Z graph_break []
2025-12-04T10:11:57.7459532Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7459619Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7459743Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7460078Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7460135Z graph_break []
2025-12-04T10:11:57.7460637Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml -
2025-12-04T10:11:57.7460737Z =========================== short test summary info ============================
2025-12-04T10:11:57.7462111Z FAILED [0.4424s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7462149Z 
2025-12-04T10:11:57.7462271Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7462793Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7462796Z 
2025-12-04T10:11:57.7462950Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7463056Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7463174Z ================== 1 failed, 32 deselected, 2 rerun in 2.78s ===================
2025-12-04T10:11:57.7463230Z Got exit code 1
2025-12-04T10:11:57.7463296Z Retrying single test...
2025-12-04T10:11:57.7463557Z W1204 09:51:53.379000 52423 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7463940Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml
2025-12-04T10:11:57.7464035Z ============================= test session starts ==============================
2025-12-04T10:11:57.7464238Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7464302Z cachedir: .pytest_cache
2025-12-04T10:11:57.7464611Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7464685Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7464751Z configfile: pytest.ini
2025-12-04T10:11:57.7465062Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7465231Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7465798Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7465867Z Running 1 items in this shard
2025-12-04T10:11:57.7465871Z 
2025-12-04T10:11:57.7466611Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:51:54.425539452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7466615Z 
2025-12-04T10:11:57.7466911Z [W1204 09:52:03.549824206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7466917Z 
2025-12-04T10:11:57.7467207Z [W1204 09:52:03.550103530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7467211Z 
2025-12-04T10:11:57.7467497Z [W1204 09:52:03.555824129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7467500Z 
2025-12-04T10:11:57.7467785Z [W1204 09:52:03.556380909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7467792Z 
2025-12-04T10:11:57.7468144Z [W1204 09:52:03.556540232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7468147Z 
2025-12-04T10:11:57.7468434Z [W1204 09:52:03.562111207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7468469Z 
2025-12-04T10:11:57.7468760Z [W1204 09:52:03.562628436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7468763Z 
2025-12-04T10:11:57.7469050Z [W1204 09:52:03.562788309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7469053Z 
2025-12-04T10:11:57.7469134Z ('RERUN', {'yellow': True}) [10.9813s] [100%]
2025-12-04T10:11:57.7469855Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:04.741666298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7469860Z 
2025-12-04T10:11:57.7470150Z [W1204 09:52:04.742232758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7470155Z 
2025-12-04T10:11:57.7470442Z [W1204 09:52:04.742372350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7470445Z 
2025-12-04T10:11:57.7470733Z [W1204 09:52:04.745420683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7470737Z 
2025-12-04T10:11:57.7471022Z [W1204 09:52:04.745986482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7471025Z 
2025-12-04T10:11:57.7471312Z [W1204 09:52:04.746124225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7471319Z 
2025-12-04T10:11:57.7471602Z [W1204 09:52:04.750853946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7471607Z 
2025-12-04T10:11:57.7471892Z [W1204 09:52:04.751328104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7471931Z 
2025-12-04T10:11:57.7472221Z [W1204 09:52:04.751465847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7472224Z 
2025-12-04T10:11:57.7472302Z ('RERUN', {'yellow': True}) [0.4195s] [100%]
2025-12-04T10:11:57.7473020Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:05.158249477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7473024Z 
2025-12-04T10:11:57.7473311Z [W1204 09:52:05.158784916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7473315Z 
2025-12-04T10:11:57.7473604Z [W1204 09:52:05.158927598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7473608Z 
2025-12-04T10:11:57.7473892Z [W1204 09:52:05.161956730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7473895Z 
2025-12-04T10:11:57.7474186Z [W1204 09:52:05.162518670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7474189Z 
2025-12-04T10:11:57.7474539Z [W1204 09:52:05.162655162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7474543Z 
2025-12-04T10:11:57.7474828Z [W1204 09:52:05.167303161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7474831Z 
2025-12-04T10:11:57.7475152Z [W1204 09:52:05.167768220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7475157Z 
2025-12-04T10:11:57.7475440Z [W1204 09:52:05.167903572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7475443Z 
2025-12-04T10:11:57.7475508Z FAILED [0.4160s] [100%]
2025-12-04T10:11:57.7475511Z 
2025-12-04T10:11:57.7475593Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7475903Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7475988Z Traceback (most recent call last):
2025-12-04T10:11:57.7476296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7476365Z     method(*args, **kwargs)
2025-12-04T10:11:57.7476658Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7476722Z     method(*args, **kwargs)
2025-12-04T10:11:57.7477019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7477077Z     with policy():
2025-12-04T10:11:57.7477371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7477436Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7478233Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7478237Z 
2025-12-04T10:11:57.7478366Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7478924Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7478928Z 
2025-12-04T10:11:57.7479091Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7479217Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7479310Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7479661Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7479786Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7479844Z graph_break []
2025-12-04T10:11:57.7480017Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7480721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7480794Z   if out == self.unknown_value:
2025-12-04T10:11:57.7481080Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7481154Z Traceback (most recent call last):
2025-12-04T10:11:57.7481519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7481583Z     method(*args, **kwargs)
2025-12-04T10:11:57.7481877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7481939Z     method(*args, **kwargs)
2025-12-04T10:11:57.7482263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7482326Z     with policy():
2025-12-04T10:11:57.7482616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7482684Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7483494Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7483499Z 
2025-12-04T10:11:57.7483628Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7484143Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7484150Z 
2025-12-04T10:11:57.7484304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7484443Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7484536Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7484885Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7485011Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7485070Z graph_break []
2025-12-04T10:11:57.7485195Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7485880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7485988Z   if out == self.unknown_value:
2025-12-04T10:11:57.7486110Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7486197Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7486321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7486660Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7486716Z graph_break []
2025-12-04T10:11:57.7486804Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7487088Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7487164Z Traceback (most recent call last):
2025-12-04T10:11:57.7487460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7487521Z     method(*args, **kwargs)
2025-12-04T10:11:57.7487812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7487873Z     method(*args, **kwargs)
2025-12-04T10:11:57.7488158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7488217Z     with policy():
2025-12-04T10:11:57.7488589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7488657Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7489468Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7489506Z 
2025-12-04T10:11:57.7489631Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7490150Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7490153Z 
2025-12-04T10:11:57.7490307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7490444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7490536Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7490877Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7491003Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7491059Z graph_break []
2025-12-04T10:11:57.7491185Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7491874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7491940Z   if out == self.unknown_value:
2025-12-04T10:11:57.7492064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7492150Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7492273Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7492612Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7492708Z graph_break []
2025-12-04T10:11:57.7492831Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7492915Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7493033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7493372Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7493428Z graph_break []
2025-12-04T10:11:57.7493924Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml -
2025-12-04T10:11:57.7494026Z =========================== short test summary info ============================
2025-12-04T10:11:57.7495321Z FAILED [0.4160s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7495326Z 
2025-12-04T10:11:57.7495450Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7496033Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7496039Z 
2025-12-04T10:11:57.7496194Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7496332Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7496451Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ==================
2025-12-04T10:11:57.7496508Z Got exit code 1
2025-12-04T10:11:57.7496570Z Retrying single test...
2025-12-04T10:11:57.7496834Z W1204 09:52:11.844000 52616 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7497217Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml
2025-12-04T10:11:57.7497313Z ============================= test session starts ==============================
2025-12-04T10:11:57.7497524Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7497588Z cachedir: .pytest_cache
2025-12-04T10:11:57.7497896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7497974Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7498038Z configfile: pytest.ini
2025-12-04T10:11:57.7498350Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7498478Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7499043Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7499115Z Running 1 items in this shard
2025-12-04T10:11:57.7499118Z 
2025-12-04T10:11:57.7499839Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:12.904307739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7499889Z 
2025-12-04T10:11:57.7500187Z [W1204 09:52:22.950981626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7500191Z 
2025-12-04T10:11:57.7500478Z [W1204 09:52:22.951234980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7500484Z 
2025-12-04T10:11:57.7500772Z [W1204 09:52:22.957031440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7500777Z 
2025-12-04T10:11:57.7501062Z [W1204 09:52:22.957606810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7501065Z 
2025-12-04T10:11:57.7501363Z [W1204 09:52:22.957772233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7501369Z 
2025-12-04T10:11:57.7501657Z [W1204 09:52:22.963316978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7501660Z 
2025-12-04T10:11:57.7501948Z [W1204 09:52:22.963840107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7501951Z 
2025-12-04T10:11:57.7502235Z [W1204 09:52:22.963994529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7502239Z 
2025-12-04T10:11:57.7502387Z ('RERUN', {'yellow': True}) [10.9042s] [100%]
2025-12-04T10:11:57.7503155Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:23.132051848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7503194Z 
2025-12-04T10:11:57.7503517Z [W1204 09:52:23.132600648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7503524Z 
2025-12-04T10:11:57.7503812Z [W1204 09:52:23.132748540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7503816Z 
2025-12-04T10:11:57.7504103Z [W1204 09:52:23.135706361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7504107Z 
2025-12-04T10:11:57.7504401Z [W1204 09:52:23.136279421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7504404Z 
2025-12-04T10:11:57.7504691Z [W1204 09:52:23.136431143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7504699Z 
2025-12-04T10:11:57.7504989Z [W1204 09:52:23.141022762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7504992Z 
2025-12-04T10:11:57.7505278Z [W1204 09:52:23.141487710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7505281Z 
2025-12-04T10:11:57.7505568Z [W1204 09:52:23.141625562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7505573Z 
2025-12-04T10:11:57.7505653Z ('RERUN', {'yellow': True}) [0.4099s] [100%]
2025-12-04T10:11:57.7506370Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:52:23.540086527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7506419Z 
2025-12-04T10:11:57.7506711Z [W1204 09:52:23.540629867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7506714Z 
2025-12-04T10:11:57.7506999Z [W1204 09:52:23.540775709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7507005Z 
2025-12-04T10:11:57.7507286Z [W1204 09:52:23.543710920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7507290Z 
2025-12-04T10:11:57.7507576Z [W1204 09:52:23.544272889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7507579Z 
2025-12-04T10:11:57.7507866Z [W1204 09:52:23.544420262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7507872Z 
2025-12-04T10:11:57.7508156Z [W1204 09:52:23.549016781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7508159Z 
2025-12-04T10:11:57.7508447Z [W1204 09:52:23.549478059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7508450Z 
2025-12-04T10:11:57.7508734Z [W1204 09:52:23.549614761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7508738Z 
2025-12-04T10:11:57.7508801Z FAILED [0.4061s] [100%]
2025-12-04T10:11:57.7508944Z 
2025-12-04T10:11:57.7509026Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7509317Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7509429Z Traceback (most recent call last):
2025-12-04T10:11:57.7509736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7509802Z     method(*args, **kwargs)
2025-12-04T10:11:57.7510094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7510157Z     method(*args, **kwargs)
2025-12-04T10:11:57.7510446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7510504Z     with policy():
2025-12-04T10:11:57.7510799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7510865Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7511671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7511678Z 
2025-12-04T10:11:57.7511810Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7512327Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7512331Z 
2025-12-04T10:11:57.7512488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7512614Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7512711Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7513061Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7513252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7513310Z graph_break []
2025-12-04T10:11:57.7513432Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7514126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7514196Z   if out == self.unknown_value:
2025-12-04T10:11:57.7514484Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7514555Z Traceback (most recent call last):
2025-12-04T10:11:57.7514852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7514916Z     method(*args, **kwargs)
2025-12-04T10:11:57.7515208Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7515269Z     method(*args, **kwargs)
2025-12-04T10:11:57.7515554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7515616Z     with policy():
2025-12-04T10:11:57.7515906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7515971Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7516856Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7516894Z 
2025-12-04T10:11:57.7517180Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7517709Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7517713Z 
2025-12-04T10:11:57.7517865Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7517994Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7518087Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7518435Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7518562Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7518631Z graph_break []
2025-12-04T10:11:57.7518759Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7519448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7519516Z   if out == self.unknown_value:
2025-12-04T10:11:57.7519641Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7519728Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7519852Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7520233Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7520291Z graph_break []
2025-12-04T10:11:57.7520454Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7520741Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7520812Z Traceback (most recent call last):
2025-12-04T10:11:57.7521110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7521173Z     method(*args, **kwargs)
2025-12-04T10:11:57.7521467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7521534Z     method(*args, **kwargs)
2025-12-04T10:11:57.7521822Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7521882Z     with policy():
2025-12-04T10:11:57.7522187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7522256Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7523070Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7523074Z 
2025-12-04T10:11:57.7523196Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7523812Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7523816Z 
2025-12-04T10:11:57.7523969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7524143Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7524232Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7524572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7524699Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7524755Z graph_break []
2025-12-04T10:11:57.7524877Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7525568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7525635Z   if out == self.unknown_value:
2025-12-04T10:11:57.7525766Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7525855Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7525974Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7526316Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7526373Z graph_break []
2025-12-04T10:11:57.7526495Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7526582Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7526704Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7527045Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7527103Z graph_break []
2025-12-04T10:11:57.7527630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml -
2025-12-04T10:11:57.7527731Z =========================== short test summary info ============================
2025-12-04T10:11:57.7529015Z FAILED [0.4061s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7529027Z 
2025-12-04T10:11:57.7529150Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7529664Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7529669Z 
2025-12-04T10:11:57.7529823Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7529925Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7530042Z ================== 1 failed, 57 deselected, 2 rerun in 11.74s ==================
2025-12-04T10:11:57.7530111Z Got exit code 1
2025-12-04T10:11:57.7530649Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7530896Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7531161Z W1204 09:52:30.184000 52809 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7531581Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml
2025-12-04T10:11:57.7531678Z ============================= test session starts ==============================
2025-12-04T10:11:57.7531883Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7531949Z cachedir: .pytest_cache
2025-12-04T10:11:57.7532251Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7532327Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7532394Z configfile: pytest.ini
2025-12-04T10:11:57.7532706Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7532842Z collecting ... collected 58 items / 33 deselected / 25 selected
2025-12-04T10:11:57.7532930Z stepcurrent: skipping 33 already run items.
2025-12-04T10:11:57.7532997Z Running 25 items in this shard
2025-12-04T10:11:57.7533002Z 
2025-12-04T10:11:57.7533496Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9207s] [  4%]
2025-12-04T10:11:57.7533980Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5358s] [  4%]
2025-12-04T10:11:57.7534424Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5232s] [  4%]
2025-12-04T10:11:57.7534428Z 
2025-12-04T10:11:57.7534512Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7534842Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7534915Z Traceback (most recent call last):
2025-12-04T10:11:57.7535220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7535286Z     method(*args, **kwargs)
2025-12-04T10:11:57.7535575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7535636Z     method(*args, **kwargs)
2025-12-04T10:11:57.7535928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7535989Z     with policy():
2025-12-04T10:11:57.7536280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7536348Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7537149Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7537153Z 
2025-12-04T10:11:57.7537278Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7537859Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7537864Z 
2025-12-04T10:11:57.7538023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7538158Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7538302Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7538856Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7538980Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7539040Z graph_break []
2025-12-04T10:11:57.7539324Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7539394Z Traceback (most recent call last):
2025-12-04T10:11:57.7539702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7539765Z     method(*args, **kwargs)
2025-12-04T10:11:57.7540058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7540129Z     method(*args, **kwargs)
2025-12-04T10:11:57.7540415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7540475Z     with policy():
2025-12-04T10:11:57.7540765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7540829Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7541640Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7541644Z 
2025-12-04T10:11:57.7541766Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7542287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7542329Z 
2025-12-04T10:11:57.7542484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7542607Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7542705Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7543250Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7543377Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7543435Z graph_break []
2025-12-04T10:11:57.7543556Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7543650Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7543768Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7544311Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7544368Z graph_break []
2025-12-04T10:11:57.7544448Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7544800Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7544875Z Traceback (most recent call last):
2025-12-04T10:11:57.7545178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7545276Z     method(*args, **kwargs)
2025-12-04T10:11:57.7545568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7545632Z     method(*args, **kwargs)
2025-12-04T10:11:57.7545920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7545977Z     with policy():
2025-12-04T10:11:57.7546270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7546335Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7547148Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7547154Z 
2025-12-04T10:11:57.7547276Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7547788Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7547794Z 
2025-12-04T10:11:57.7547954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7548079Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7548171Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7548709Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7548834Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7548930Z graph_break []
2025-12-04T10:11:57.7549051Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7549142Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7549260Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7549793Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7549853Z graph_break []
2025-12-04T10:11:57.7549978Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7550067Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7550186Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7550720Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7550782Z graph_break []
2025-12-04T10:11:57.7551267Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml -
2025-12-04T10:11:57.7551369Z =========================== short test summary info ============================
2025-12-04T10:11:57.7552717Z FAILED [0.5232s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7552755Z 
2025-12-04T10:11:57.7552882Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7553395Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7553399Z 
2025-12-04T10:11:57.7553549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7553657Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7553777Z ================== 1 failed, 33 deselected, 2 rerun in 3.00s ===================
2025-12-04T10:11:57.7553836Z Got exit code 1
2025-12-04T10:11:57.7553898Z Retrying single test...
2025-12-04T10:11:57.7554160Z W1204 09:52:39.782000 52998 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7554550Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml
2025-12-04T10:11:57.7554643Z ============================= test session starts ==============================
2025-12-04T10:11:57.7554845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7554910Z cachedir: .pytest_cache
2025-12-04T10:11:57.7555215Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7555294Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7555358Z configfile: pytest.ini
2025-12-04T10:11:57.7555667Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7555802Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7556405Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7556480Z Running 1 items in this shard
2025-12-04T10:11:57.7556484Z 
2025-12-04T10:11:57.7557206Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:52:41.370651197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7557210Z 
2025-12-04T10:11:57.7557510Z [W1204 09:52:50.605282863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7557516Z 
2025-12-04T10:11:57.7557813Z [W1204 09:52:50.605532867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7557819Z 
2025-12-04T10:11:57.7558109Z [W1204 09:52:50.611415789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7558113Z 
2025-12-04T10:11:57.7558400Z [W1204 09:52:50.612014989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7558403Z 
2025-12-04T10:11:57.7558685Z [W1204 09:52:50.612200232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7558688Z 
2025-12-04T10:11:57.7559042Z [W1204 09:52:50.617565303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7559046Z 
2025-12-04T10:11:57.7559332Z [W1204 09:52:50.618088632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7559369Z 
2025-12-04T10:11:57.7559654Z [W1204 09:52:50.618249745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7559657Z 
2025-12-04T10:11:57.7559737Z ('RERUN', {'yellow': True}) [11.1748s] [100%]
2025-12-04T10:11:57.7560501Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:52:51.414759709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7560505Z 
2025-12-04T10:11:57.7560795Z [W1204 09:52:51.415268477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7560798Z 
2025-12-04T10:11:57.7561082Z [W1204 09:52:51.415409040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7561090Z 
2025-12-04T10:11:57.7561373Z [W1204 09:52:51.418276519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7561376Z 
2025-12-04T10:11:57.7561658Z [W1204 09:52:51.418727346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7561662Z 
2025-12-04T10:11:57.7561949Z [W1204 09:52:51.418871509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7561952Z 
2025-12-04T10:11:57.7562239Z [W1204 09:52:51.423353486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7562242Z 
2025-12-04T10:11:57.7562527Z [W1204 09:52:51.423813924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7562570Z 
2025-12-04T10:11:57.7562856Z [W1204 09:52:51.423949546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7562859Z 
2025-12-04T10:11:57.7562937Z ('RERUN', {'yellow': True}) [0.4976s] [100%]
2025-12-04T10:11:57.7563654Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:52:51.910875157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7563659Z 
2025-12-04T10:11:57.7563951Z [W1204 09:52:51.911383756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7563953Z 
2025-12-04T10:11:57.7564239Z [W1204 09:52:51.911522879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7564245Z 
2025-12-04T10:11:57.7564529Z [W1204 09:52:51.914423728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7564532Z 
2025-12-04T10:11:57.7564820Z [W1204 09:52:51.914874276 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7564822Z 
2025-12-04T10:11:57.7565107Z [W1204 09:52:51.915011578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7565110Z 
2025-12-04T10:11:57.7565491Z [W1204 09:52:51.919514225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7565496Z 
2025-12-04T10:11:57.7565785Z [W1204 09:52:51.919969053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7565824Z 
2025-12-04T10:11:57.7566115Z [W1204 09:52:51.920165866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7566118Z 
2025-12-04T10:11:57.7566177Z FAILED [0.4954s] [100%]
2025-12-04T10:11:57.7566180Z 
2025-12-04T10:11:57.7566259Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7566548Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7566620Z Traceback (most recent call last):
2025-12-04T10:11:57.7566933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7566997Z     method(*args, **kwargs)
2025-12-04T10:11:57.7567288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7567357Z     method(*args, **kwargs)
2025-12-04T10:11:57.7567647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7567709Z     with policy():
2025-12-04T10:11:57.7568002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7568067Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7568869Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7568873Z 
2025-12-04T10:11:57.7568998Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7569517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7569559Z 
2025-12-04T10:11:57.7569715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7569841Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7569948Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7570496Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7570625Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7570680Z graph_break []
2025-12-04T10:11:57.7570802Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7571497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7571569Z   if out == self.unknown_value:
2025-12-04T10:11:57.7571858Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7571928Z Traceback (most recent call last):
2025-12-04T10:11:57.7572221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7572353Z     method(*args, **kwargs)
2025-12-04T10:11:57.7572643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7572703Z     method(*args, **kwargs)
2025-12-04T10:11:57.7572996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7573089Z     with policy():
2025-12-04T10:11:57.7573387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7573451Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7574264Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7574274Z 
2025-12-04T10:11:57.7574399Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7574915Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7574922Z 
2025-12-04T10:11:57.7575077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7575200Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7575294Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7575834Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7575962Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7576021Z graph_break []
2025-12-04T10:11:57.7576142Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7576836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7576943Z   if out == self.unknown_value:
2025-12-04T10:11:57.7577064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7577156Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7577278Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7577818Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7577878Z graph_break []
2025-12-04T10:11:57.7577958Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7578255Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7578330Z Traceback (most recent call last):
2025-12-04T10:11:57.7578627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7578692Z     method(*args, **kwargs)
2025-12-04T10:11:57.7578980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7579051Z     method(*args, **kwargs)
2025-12-04T10:11:57.7579338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7579464Z     with policy():
2025-12-04T10:11:57.7579759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7579825Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7580668Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7580676Z 
2025-12-04T10:11:57.7580799Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7581310Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7581314Z 
2025-12-04T10:11:57.7581473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7581595Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7581686Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7582227Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7582349Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7582410Z graph_break []
2025-12-04T10:11:57.7582531Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7583223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7583291Z   if out == self.unknown_value:
2025-12-04T10:11:57.7583410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7583505Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7583667Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7584202Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7584261Z graph_break []
2025-12-04T10:11:57.7584380Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7584469Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7584591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7585123Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7585184Z graph_break []
2025-12-04T10:11:57.7585672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml -
2025-12-04T10:11:57.7585777Z =========================== short test summary info ============================
2025-12-04T10:11:57.7587123Z FAILED [0.4954s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7587128Z 
2025-12-04T10:11:57.7587254Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7587806Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7587811Z 
2025-12-04T10:11:57.7587963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7588070Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7588184Z ================== 1 failed, 57 deselected, 2 rerun in 12.19s ==================
2025-12-04T10:11:57.7588242Z Got exit code 1
2025-12-04T10:11:57.7588305Z Retrying single test...
2025-12-04T10:11:57.7588582Z W1204 09:52:58.559000 53192 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7588972Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml
2025-12-04T10:11:57.7589066Z ============================= test session starts ==============================
2025-12-04T10:11:57.7589276Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7589339Z cachedir: .pytest_cache
2025-12-04T10:11:57.7589643Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7589721Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7589783Z configfile: pytest.ini
2025-12-04T10:11:57.7590092Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7590228Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7590794Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7590905Z Running 1 items in this shard
2025-12-04T10:11:57.7590908Z 
2025-12-04T10:11:57.7591632Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:53:00.167238164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7591636Z 
2025-12-04T10:11:57.7591933Z [W1204 09:53:09.460927240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7591937Z 
2025-12-04T10:11:57.7592227Z [W1204 09:53:09.461197744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7592230Z 
2025-12-04T10:11:57.7592518Z [W1204 09:53:09.467883116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7592528Z 
2025-12-04T10:11:57.7592814Z [W1204 09:53:09.468499537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7592817Z 
2025-12-04T10:11:57.7593105Z [W1204 09:53:09.468686800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7593109Z 
2025-12-04T10:11:57.7593397Z [W1204 09:53:09.474235973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7593401Z 
2025-12-04T10:11:57.7593769Z [W1204 09:53:09.474777022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7593772Z 
2025-12-04T10:11:57.7594074Z [W1204 09:53:09.474937585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7594110Z 
2025-12-04T10:11:57.7594192Z ('RERUN', {'yellow': True}) [11.2609s] [100%]
2025-12-04T10:11:57.7594915Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:53:10.286934849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7594919Z 
2025-12-04T10:11:57.7595205Z [W1204 09:53:10.287449728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7595210Z 
2025-12-04T10:11:57.7595504Z [W1204 09:53:10.287594960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7595507Z 
2025-12-04T10:11:57.7595793Z [W1204 09:53:10.290577620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7595799Z 
2025-12-04T10:11:57.7596083Z [W1204 09:53:10.291043678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7596092Z 
2025-12-04T10:11:57.7596376Z [W1204 09:53:10.291182290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7596380Z 
2025-12-04T10:11:57.7596664Z [W1204 09:53:10.295685696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7596667Z 
2025-12-04T10:11:57.7596956Z [W1204 09:53:10.296144103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7596959Z 
2025-12-04T10:11:57.7597245Z [W1204 09:53:10.296281516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7597250Z 
2025-12-04T10:11:57.7597368Z ('RERUN', {'yellow': True}) [0.5062s] [100%]
2025-12-04T10:11:57.7598088Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:53:10.789865098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7598091Z 
2025-12-04T10:11:57.7598382Z [W1204 09:53:10.790402997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7598386Z 
2025-12-04T10:11:57.7598675Z [W1204 09:53:10.790549509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7598677Z 
2025-12-04T10:11:57.7598963Z [W1204 09:53:10.793484339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7598971Z 
2025-12-04T10:11:57.7599257Z [W1204 09:53:10.793939006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7599260Z 
2025-12-04T10:11:57.7599543Z [W1204 09:53:10.794078679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7599546Z 
2025-12-04T10:11:57.7599834Z [W1204 09:53:10.798646965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7599837Z 
2025-12-04T10:11:57.7600243Z [W1204 09:53:10.799110323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7600247Z 
2025-12-04T10:11:57.7600539Z [W1204 09:53:10.799248985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7600542Z 
2025-12-04T10:11:57.7600636Z FAILED [0.5010s] [100%]
2025-12-04T10:11:57.7600641Z 
2025-12-04T10:11:57.7600724Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7601014Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7601086Z Traceback (most recent call last):
2025-12-04T10:11:57.7601395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7601460Z     method(*args, **kwargs)
2025-12-04T10:11:57.7601754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7601819Z     method(*args, **kwargs)
2025-12-04T10:11:57.7602107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7602167Z     with policy():
2025-12-04T10:11:57.7602460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7602527Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7603329Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7603334Z 
2025-12-04T10:11:57.7603458Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7603977Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7603980Z 
2025-12-04T10:11:57.7604136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7604301Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7604393Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7604938Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7605065Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7605121Z graph_break []
2025-12-04T10:11:57.7605246Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7605941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7606012Z   if out == self.unknown_value:
2025-12-04T10:11:57.7606304Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7606387Z Traceback (most recent call last):
2025-12-04T10:11:57.7606689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7606760Z     method(*args, **kwargs)
2025-12-04T10:11:57.7607047Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7607111Z     method(*args, **kwargs)
2025-12-04T10:11:57.7607465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7607523Z     with policy():
2025-12-04T10:11:57.7607819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7607917Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7608729Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7608734Z 
2025-12-04T10:11:57.7608856Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7609369Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7609373Z 
2025-12-04T10:11:57.7609534Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7609656Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7609756Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7610295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7610421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7610479Z graph_break []
2025-12-04T10:11:57.7610599Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7611293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7611362Z   if out == self.unknown_value:
2025-12-04T10:11:57.7611484Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7611617Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7611740Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7612281Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7612337Z graph_break []
2025-12-04T10:11:57.7612419Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7612711Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7612782Z Traceback (most recent call last):
2025-12-04T10:11:57.7613079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7613144Z     method(*args, **kwargs)
2025-12-04T10:11:57.7613435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7613499Z     method(*args, **kwargs)
2025-12-04T10:11:57.7613787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7613848Z     with policy():
2025-12-04T10:11:57.7614145Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7614210Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7615089Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7615126Z 
2025-12-04T10:11:57.7615251Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7615768Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7615776Z 
2025-12-04T10:11:57.7615938Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7616062Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7616163Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7616700Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7616823Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7616887Z graph_break []
2025-12-04T10:11:57.7617139Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7617835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7617900Z   if out == self.unknown_value:
2025-12-04T10:11:57.7618023Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7618118Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7618238Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7618776Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7618998Z graph_break []
2025-12-04T10:11:57.7619120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7619211Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7619331Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7619863Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7619928Z graph_break []
2025-12-04T10:11:57.7620411Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml -
2025-12-04T10:11:57.7620515Z =========================== short test summary info ============================
2025-12-04T10:11:57.7621799Z FAILED [0.5010s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7621804Z 
2025-12-04T10:11:57.7621934Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7622541Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7622587Z 
2025-12-04T10:11:57.7622744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7622850Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7622976Z ================== 1 failed, 57 deselected, 2 rerun in 12.29s ==================
2025-12-04T10:11:57.7623038Z Got exit code 1
2025-12-04T10:11:57.7623514Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7623754Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7624019Z W1204 09:53:17.444000 53386 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7624404Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml
2025-12-04T10:11:57.7624508Z ============================= test session starts ==============================
2025-12-04T10:11:57.7624715Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7624779Z cachedir: .pytest_cache
2025-12-04T10:11:57.7625090Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7625164Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7625231Z configfile: pytest.ini
2025-12-04T10:11:57.7625545Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7625670Z collecting ... collected 58 items / 34 deselected / 24 selected
2025-12-04T10:11:57.7625759Z stepcurrent: skipping 34 already run items.
2025-12-04T10:11:57.7625825Z Running 24 items in this shard
2025-12-04T10:11:57.7625829Z 
2025-12-04T10:11:57.7626326Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8972s] [  4%]
2025-12-04T10:11:57.7626852Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4919s] [  4%]
2025-12-04T10:11:57.7627291Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4748s] [  4%]
2025-12-04T10:11:57.7627294Z 
2025-12-04T10:11:57.7627379Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7627668Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7627743Z Traceback (most recent call last):
2025-12-04T10:11:57.7628068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7628133Z     method(*args, **kwargs)
2025-12-04T10:11:57.7628442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7628504Z     method(*args, **kwargs)
2025-12-04T10:11:57.7628799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7628856Z     with policy():
2025-12-04T10:11:57.7629222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7629291Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7630100Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7630139Z 
2025-12-04T10:11:57.7630272Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7630794Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7630798Z 
2025-12-04T10:11:57.7630957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7631092Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7631185Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7631539Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7631667Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7631725Z graph_break []
2025-12-04T10:11:57.7632018Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7632090Z Traceback (most recent call last):
2025-12-04T10:11:57.7632384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7632449Z     method(*args, **kwargs)
2025-12-04T10:11:57.7632742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7632818Z     method(*args, **kwargs)
2025-12-04T10:11:57.7633108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7633167Z     with policy():
2025-12-04T10:11:57.7633463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7633567Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7634390Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7634393Z 
2025-12-04T10:11:57.7634520Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7635035Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7635042Z 
2025-12-04T10:11:57.7635196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7635324Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7635417Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7635762Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7635884Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7635945Z graph_break []
2025-12-04T10:11:57.7636069Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7636227Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7636348Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7636689Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7636804Z graph_break []
2025-12-04T10:11:57.7636888Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7637177Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7637249Z Traceback (most recent call last):
2025-12-04T10:11:57.7637556Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7637621Z     method(*args, **kwargs)
2025-12-04T10:11:57.7637922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7637983Z     method(*args, **kwargs)
2025-12-04T10:11:57.7638272Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7638330Z     with policy():
2025-12-04T10:11:57.7638625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7638692Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7639508Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7639512Z 
2025-12-04T10:11:57.7639638Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7640230Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7640234Z 
2025-12-04T10:11:57.7640392Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7640570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7640659Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7641005Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7641127Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7641187Z graph_break []
2025-12-04T10:11:57.7641308Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7641399Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7641520Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7641859Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7641918Z graph_break []
2025-12-04T10:11:57.7642041Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7642125Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7642245Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7642582Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7642639Z graph_break []
2025-12-04T10:11:57.7643196Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml -
2025-12-04T10:11:57.7643299Z =========================== short test summary info ============================
2025-12-04T10:11:57.7644594Z FAILED [0.4748s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7644652Z 
2025-12-04T10:11:57.7644774Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7645295Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7645298Z 
2025-12-04T10:11:57.7645451Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7645555Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7645675Z ================== 1 failed, 34 deselected, 2 rerun in 2.89s ===================
2025-12-04T10:11:57.7645732Z Got exit code 1
2025-12-04T10:11:57.7645795Z Retrying single test...
2025-12-04T10:11:57.7646060Z W1204 09:53:27.123000 53574 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7646450Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml
2025-12-04T10:11:57.7646546Z ============================= test session starts ==============================
2025-12-04T10:11:57.7646756Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7646826Z cachedir: .pytest_cache
2025-12-04T10:11:57.7647136Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7647249Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7647315Z configfile: pytest.ini
2025-12-04T10:11:57.7647630Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7647757Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7648327Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7648396Z Running 1 items in this shard
2025-12-04T10:11:57.7648399Z 
2025-12-04T10:11:57.7649134Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:28.212665546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7649141Z 
2025-12-04T10:11:57.7649439Z [W1204 09:53:37.279430147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7649443Z 
2025-12-04T10:11:57.7649731Z [W1204 09:53:37.279682901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7649735Z 
2025-12-04T10:11:57.7650041Z [W1204 09:53:37.285621283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7650045Z 
2025-12-04T10:11:57.7650401Z [W1204 09:53:37.286209653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7650408Z 
2025-12-04T10:11:57.7650705Z [W1204 09:53:37.286380036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7650743Z 
2025-12-04T10:11:57.7651034Z [W1204 09:53:37.291869150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7651037Z 
2025-12-04T10:11:57.7651329Z [W1204 09:53:37.292415789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7651332Z 
2025-12-04T10:11:57.7651615Z [W1204 09:53:37.292574992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7651619Z 
2025-12-04T10:11:57.7651699Z ('RERUN', {'yellow': True}) [10.9651s] [100%]
2025-12-04T10:11:57.7652419Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:38.501420760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7652426Z 
2025-12-04T10:11:57.7652718Z [W1204 09:53:38.501939169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7652721Z 
2025-12-04T10:11:57.7653008Z [W1204 09:53:38.502080992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7653011Z 
2025-12-04T10:11:57.7653295Z [W1204 09:53:38.504921060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7653302Z 
2025-12-04T10:11:57.7653589Z [W1204 09:53:38.505466709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7653592Z 
2025-12-04T10:11:57.7653884Z [W1204 09:53:38.505606852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7653888Z 
2025-12-04T10:11:57.7654213Z [W1204 09:53:38.510038907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7654216Z 
2025-12-04T10:11:57.7654500Z [W1204 09:53:38.510495595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7654503Z 
2025-12-04T10:11:57.7654795Z [W1204 09:53:38.510631928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7654799Z 
2025-12-04T10:11:57.7654875Z ('RERUN', {'yellow': True}) [0.4517s] [100%]
2025-12-04T10:11:57.7655602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:39.952340525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7655607Z 
2025-12-04T10:11:57.7655896Z [W1204 09:53:39.952869494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7655899Z 
2025-12-04T10:11:57.7656187Z [W1204 09:53:39.953008606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7656189Z 
2025-12-04T10:11:57.7656474Z [W1204 09:53:39.955927736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7656477Z 
2025-12-04T10:11:57.7656836Z [W1204 09:53:39.956492516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7656843Z 
2025-12-04T10:11:57.7657132Z [W1204 09:53:39.956632588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7657170Z 
2025-12-04T10:11:57.7657460Z [W1204 09:53:39.961202066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7657463Z 
2025-12-04T10:11:57.7657755Z [W1204 09:53:39.961663684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7657758Z 
2025-12-04T10:11:57.7658054Z [W1204 09:53:39.961799887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7658057Z 
2025-12-04T10:11:57.7658122Z FAILED [0.4439s] [100%]
2025-12-04T10:11:57.7658125Z 
2025-12-04T10:11:57.7658211Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7658505Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7658582Z Traceback (most recent call last):
2025-12-04T10:11:57.7658887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7658959Z     method(*args, **kwargs)
2025-12-04T10:11:57.7659261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7659327Z     method(*args, **kwargs)
2025-12-04T10:11:57.7659619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7659677Z     with policy():
2025-12-04T10:11:57.7659977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7660044Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7660845Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7660888Z 
2025-12-04T10:11:57.7661021Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7661539Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7661542Z 
2025-12-04T10:11:57.7661707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7661834Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7661927Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7662276Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7662404Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7662466Z graph_break []
2025-12-04T10:11:57.7662590Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7663278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7663360Z   if out == self.unknown_value:
2025-12-04T10:11:57.7663719Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7663795Z Traceback (most recent call last):
2025-12-04T10:11:57.7664088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7664185Z     method(*args, **kwargs)
2025-12-04T10:11:57.7664476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7664539Z     method(*args, **kwargs)
2025-12-04T10:11:57.7664826Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7664891Z     with policy():
2025-12-04T10:11:57.7665180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7665247Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7666066Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7666070Z 
2025-12-04T10:11:57.7666196Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7666714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7666717Z 
2025-12-04T10:11:57.7666875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7667002Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7667093Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7667439Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7667566Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7667624Z graph_break []
2025-12-04T10:11:57.7667749Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7668494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7668563Z   if out == self.unknown_value:
2025-12-04T10:11:57.7668685Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7668775Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7668899Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7669244Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7669301Z graph_break []
2025-12-04T10:11:57.7669387Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7669677Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7669747Z Traceback (most recent call last):
2025-12-04T10:11:57.7670043Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7670105Z     method(*args, **kwargs)
2025-12-04T10:11:57.7670398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7670458Z     method(*args, **kwargs)
2025-12-04T10:11:57.7670837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7670901Z     with policy():
2025-12-04T10:11:57.7671194Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7671297Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7672109Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7672114Z 
2025-12-04T10:11:57.7672236Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7672759Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7672763Z 
2025-12-04T10:11:57.7672916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7673042Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7673134Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7673474Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7673600Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7673656Z graph_break []
2025-12-04T10:11:57.7673781Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7674470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7674538Z   if out == self.unknown_value:
2025-12-04T10:11:57.7674665Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7674791Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7674913Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7675253Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7675310Z graph_break []
2025-12-04T10:11:57.7675434Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7675522Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7675641Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7675984Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7676043Z graph_break []
2025-12-04T10:11:57.7676532Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml -
2025-12-04T10:11:57.7676633Z =========================== short test summary info ============================
2025-12-04T10:11:57.7677991Z FAILED [0.4439s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7677999Z 
2025-12-04T10:11:57.7678121Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7678638Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7678679Z 
2025-12-04T10:11:57.7678833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7678937Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7679056Z ================== 1 failed, 57 deselected, 2 rerun in 11.88s ==================
2025-12-04T10:11:57.7679117Z Got exit code 1
2025-12-04T10:11:57.7679183Z Retrying single test...
2025-12-04T10:11:57.7679452Z W1204 09:53:45.590000 53767 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7679839Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml
2025-12-04T10:11:57.7679974Z ============================= test session starts ==============================
2025-12-04T10:11:57.7680182Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7680248Z cachedir: .pytest_cache
2025-12-04T10:11:57.7680558Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7680637Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7680702Z configfile: pytest.ini
2025-12-04T10:11:57.7681020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7681147Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7681721Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7681790Z Running 1 items in this shard
2025-12-04T10:11:57.7681837Z 
2025-12-04T10:11:57.7682564Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:46.670123499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7682572Z 
2025-12-04T10:11:57.7682871Z [W1204 09:53:55.804668960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7682874Z 
2025-12-04T10:11:57.7683169Z [W1204 09:53:55.804924524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7683173Z 
2025-12-04T10:11:57.7683462Z [W1204 09:53:55.810853006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7683465Z 
2025-12-04T10:11:57.7683753Z [W1204 09:53:55.811452706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7683758Z 
2025-12-04T10:11:57.7684045Z [W1204 09:53:55.811627489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7684049Z 
2025-12-04T10:11:57.7684333Z [W1204 09:53:55.817078283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7684336Z 
2025-12-04T10:11:57.7688529Z [W1204 09:53:55.817594412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7688650Z 
2025-12-04T10:11:57.7689006Z [W1204 09:53:55.817753454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7689011Z 
2025-12-04T10:11:57.7689103Z ('RERUN', {'yellow': True}) [11.0251s] [100%]
2025-12-04T10:11:57.7689894Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:57.031568185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7689899Z 
2025-12-04T10:11:57.7690213Z [W1204 09:53:57.032099774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7690217Z 
2025-12-04T10:11:57.7690517Z [W1204 09:53:57.032244837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7690524Z 
2025-12-04T10:11:57.7690810Z [W1204 09:53:57.035163797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7690819Z 
2025-12-04T10:11:57.7691103Z [W1204 09:53:57.035719727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7691111Z 
2025-12-04T10:11:57.7691403Z [W1204 09:53:57.035861829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7691407Z 
2025-12-04T10:11:57.7691703Z [W1204 09:53:57.040352536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7691706Z 
2025-12-04T10:11:57.7691991Z [W1204 09:53:57.040817224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7691994Z 
2025-12-04T10:11:57.7692296Z [W1204 09:53:57.040955767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7692300Z 
2025-12-04T10:11:57.7692382Z ('RERUN', {'yellow': True}) [0.4530s] [100%]
2025-12-04T10:11:57.7693121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 09:53:57.481780307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7693162Z 
2025-12-04T10:11:57.7693449Z [W1204 09:53:57.482312086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7693453Z 
2025-12-04T10:11:57.7693740Z [W1204 09:53:57.482459849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7693746Z 
2025-12-04T10:11:57.7694028Z [W1204 09:53:57.485346178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7694032Z 
2025-12-04T10:11:57.7694317Z [W1204 09:53:57.485905158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7694326Z 
2025-12-04T10:11:57.7694609Z [W1204 09:53:57.486047200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7694613Z 
2025-12-04T10:11:57.7694894Z [W1204 09:53:57.490522096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7694897Z 
2025-12-04T10:11:57.7695183Z [W1204 09:53:57.490991504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7695186Z 
2025-12-04T10:11:57.7695550Z [W1204 09:53:57.491130707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7695554Z 
2025-12-04T10:11:57.7695622Z FAILED [0.4479s] [100%]
2025-12-04T10:11:57.7695626Z 
2025-12-04T10:11:57.7695751Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7696058Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7696138Z Traceback (most recent call last):
2025-12-04T10:11:57.7696454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7696524Z     method(*args, **kwargs)
2025-12-04T10:11:57.7696820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7696883Z     method(*args, **kwargs)
2025-12-04T10:11:57.7697176Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7697237Z     with policy():
2025-12-04T10:11:57.7697542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7697614Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7698420Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7698424Z 
2025-12-04T10:11:57.7698560Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7699086Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7699090Z 
2025-12-04T10:11:57.7699253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7699389Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7699527Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7699883Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7700011Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7700078Z graph_break []
2025-12-04T10:11:57.7700213Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7700917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7700996Z   if out == self.unknown_value:
2025-12-04T10:11:57.7701295Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7701378Z Traceback (most recent call last):
2025-12-04T10:11:57.7701688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7701752Z     method(*args, **kwargs)
2025-12-04T10:11:57.7702046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7702107Z     method(*args, **kwargs)
2025-12-04T10:11:57.7702394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7702549Z     with policy():
2025-12-04T10:11:57.7702843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7702913Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7703740Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7703780Z 
2025-12-04T10:11:57.7703911Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7704435Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7704438Z 
2025-12-04T10:11:57.7704600Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7704731Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7704827Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7705178Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7705309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7705366Z graph_break []
2025-12-04T10:11:57.7705491Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7706177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7706248Z   if out == self.unknown_value:
2025-12-04T10:11:57.7706370Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7706459Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7706582Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7706973Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7707036Z graph_break []
2025-12-04T10:11:57.7707129Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7707420Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7707493Z Traceback (most recent call last):
2025-12-04T10:11:57.7707798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7707863Z     method(*args, **kwargs)
2025-12-04T10:11:57.7708158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7708219Z     method(*args, **kwargs)
2025-12-04T10:11:57.7708510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7708571Z     with policy():
2025-12-04T10:11:57.7708864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7708927Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7709811Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7709816Z 
2025-12-04T10:11:57.7709945Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7710468Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7710507Z 
2025-12-04T10:11:57.7710667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7710801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7710896Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7711245Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7711373Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7711436Z graph_break []
2025-12-04T10:11:57.7711561Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7712268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7712339Z   if out == self.unknown_value:
2025-12-04T10:11:57.7712469Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7712558Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7712680Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7713024Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7713085Z graph_break []
2025-12-04T10:11:57.7713210Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7713295Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7713413Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7713792Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7713849Z graph_break []
2025-12-04T10:11:57.7714346Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml -
2025-12-04T10:11:57.7714457Z =========================== short test summary info ============================
2025-12-04T10:11:57.7715759Z FAILED [0.4479s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7715770Z 
2025-12-04T10:11:57.7715895Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7716424Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7716428Z 
2025-12-04T10:11:57.7716588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7716691Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7716879Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.7716939Z Got exit code 1
2025-12-04T10:11:57.7717636Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7717971Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7718238Z W1204 09:54:04.110000 53960 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7718631Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml
2025-12-04T10:11:57.7718728Z ============================= test session starts ==============================
2025-12-04T10:11:57.7718940Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7719014Z cachedir: .pytest_cache
2025-12-04T10:11:57.7719322Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7719400Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7719471Z configfile: pytest.ini
2025-12-04T10:11:57.7719790Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7719974Z collecting ... collected 58 items / 35 deselected / 23 selected
2025-12-04T10:11:57.7720063Z stepcurrent: skipping 35 already run items.
2025-12-04T10:11:57.7720133Z Running 23 items in this shard
2025-12-04T10:11:57.7720137Z 
2025-12-04T10:11:57.7720635Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8425s] [  4%]
2025-12-04T10:11:57.7721114Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4579s] [  4%]
2025-12-04T10:11:57.7721558Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4541s] [  4%]
2025-12-04T10:11:57.7721625Z 
2025-12-04T10:11:57.7721710Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7721999Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7722076Z Traceback (most recent call last):
2025-12-04T10:11:57.7722384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7722451Z     method(*args, **kwargs)
2025-12-04T10:11:57.7722746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7722809Z     method(*args, **kwargs)
2025-12-04T10:11:57.7723096Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7723158Z     with policy():
2025-12-04T10:11:57.7723452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7723516Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7724307Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7724311Z 
2025-12-04T10:11:57.7724537Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7725065Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7725102Z 
2025-12-04T10:11:57.7725269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7725401Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7725495Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7725847Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7725972Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7726032Z graph_break []
2025-12-04T10:11:57.7726321Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7726396Z Traceback (most recent call last):
2025-12-04T10:11:57.7726696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7726764Z     method(*args, **kwargs)
2025-12-04T10:11:57.7727053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7727119Z     method(*args, **kwargs)
2025-12-04T10:11:57.7727404Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7727469Z     with policy():
2025-12-04T10:11:57.7727767Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7727832Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7728647Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7728706Z 
2025-12-04T10:11:57.7728837Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7729357Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7729361Z 
2025-12-04T10:11:57.7729518Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7729646Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7729740Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7730087Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7730212Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7730270Z graph_break []
2025-12-04T10:11:57.7730395Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7730484Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7730603Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7730941Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7730998Z graph_break []
2025-12-04T10:11:57.7731082Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7731438Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7731512Z Traceback (most recent call last):
2025-12-04T10:11:57.7731815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7732000Z     method(*args, **kwargs)
2025-12-04T10:11:57.7732290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7732355Z     method(*args, **kwargs)
2025-12-04T10:11:57.7732640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7732696Z     with policy():
2025-12-04T10:11:57.7732991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7733059Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7733862Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7733868Z 
2025-12-04T10:11:57.7733995Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7734509Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7734517Z 
2025-12-04T10:11:57.7734671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7734856Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7735024Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7735572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7735707Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7735771Z graph_break []
2025-12-04T10:11:57.7735960Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7736053Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7736177Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7736520Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7736591Z graph_break []
2025-12-04T10:11:57.7736717Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7736807Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7736931Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7737267Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7737334Z graph_break []
2025-12-04T10:11:57.7737833Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml -
2025-12-04T10:11:57.7737933Z =========================== short test summary info ============================
2025-12-04T10:11:57.7739320Z FAILED [0.4541s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7739327Z 
2025-12-04T10:11:57.7739456Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7740019Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7740023Z 
2025-12-04T10:11:57.7740179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7740285Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7740399Z ================== 1 failed, 35 deselected, 2 rerun in 2.78s ===================
2025-12-04T10:11:57.7740457Z Got exit code 1
2025-12-04T10:11:57.7740525Z Retrying single test...
2025-12-04T10:11:57.7740792Z W1204 09:54:13.734000 54141 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7741182Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml
2025-12-04T10:11:57.7741276Z ============================= test session starts ==============================
2025-12-04T10:11:57.7741483Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7741550Z cachedir: .pytest_cache
2025-12-04T10:11:57.7741853Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7741938Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7742008Z configfile: pytest.ini
2025-12-04T10:11:57.7742324Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7742455Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7743021Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7743131Z Running 1 items in this shard
2025-12-04T10:11:57.7743135Z 
2025-12-04T10:11:57.7743860Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:14.782817801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7743864Z 
2025-12-04T10:11:57.7744163Z [W1204 09:54:23.870076425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7744167Z 
2025-12-04T10:11:57.7744459Z [W1204 09:54:23.870336260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7744462Z 
2025-12-04T10:11:57.7744749Z [W1204 09:54:23.876050747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7744755Z 
2025-12-04T10:11:57.7745041Z [W1204 09:54:23.876642597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7745044Z 
2025-12-04T10:11:57.7745330Z [W1204 09:54:23.876816911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7745333Z 
2025-12-04T10:11:57.7745622Z [W1204 09:54:23.882316554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7745626Z 
2025-12-04T10:11:57.7745976Z [W1204 09:54:23.882833973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7745979Z 
2025-12-04T10:11:57.7746264Z [W1204 09:54:23.882994686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7746305Z 
2025-12-04T10:11:57.7746386Z ('RERUN', {'yellow': True}) [10.9347s] [100%]
2025-12-04T10:11:57.7747102Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:25.045640005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7747106Z 
2025-12-04T10:11:57.7747409Z [W1204 09:54:25.046183244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7747412Z 
2025-12-04T10:11:57.7747702Z [W1204 09:54:25.046326437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7747706Z 
2025-12-04T10:11:57.7747993Z [W1204 09:54:25.049214997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7747999Z 
2025-12-04T10:11:57.7748293Z [W1204 09:54:25.049761356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7748296Z 
2025-12-04T10:11:57.7748583Z [W1204 09:54:25.049899748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7748588Z 
2025-12-04T10:11:57.7748871Z [W1204 09:54:25.054350355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7748875Z 
2025-12-04T10:11:57.7749163Z [W1204 09:54:25.054808642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7749166Z 
2025-12-04T10:11:57.7749448Z [W1204 09:54:25.054944025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7749487Z 
2025-12-04T10:11:57.7749565Z ('RERUN', {'yellow': True}) [0.4078s] [100%]
2025-12-04T10:11:57.7750279Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:25.451107691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7750283Z 
2025-12-04T10:11:57.7750567Z [W1204 09:54:25.451667841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7750570Z 
2025-12-04T10:11:57.7750862Z [W1204 09:54:25.451813523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7750866Z 
2025-12-04T10:11:57.7751151Z [W1204 09:54:25.454667302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7751157Z 
2025-12-04T10:11:57.7751445Z [W1204 09:54:25.455209571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7751448Z 
2025-12-04T10:11:57.7751734Z [W1204 09:54:25.455348283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7751737Z 
2025-12-04T10:11:57.7752025Z [W1204 09:54:25.459780559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7752029Z 
2025-12-04T10:11:57.7752381Z [W1204 09:54:25.460250507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7752384Z 
2025-12-04T10:11:57.7752687Z [W1204 09:54:25.460400830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7752742Z 
2025-12-04T10:11:57.7752806Z FAILED [0.4003s] [100%]
2025-12-04T10:11:57.7752810Z 
2025-12-04T10:11:57.7752894Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7753183Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7753255Z Traceback (most recent call last):
2025-12-04T10:11:57.7753574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7753642Z     method(*args, **kwargs)
2025-12-04T10:11:57.7753938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7754001Z     method(*args, **kwargs)
2025-12-04T10:11:57.7754295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7754355Z     with policy():
2025-12-04T10:11:57.7754656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7754720Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7755519Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7755525Z 
2025-12-04T10:11:57.7755654Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7756175Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7756182Z 
2025-12-04T10:11:57.7756342Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7756510Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7756608Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7756954Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7757079Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7757141Z graph_break []
2025-12-04T10:11:57.7757263Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7757961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7758034Z   if out == self.unknown_value:
2025-12-04T10:11:57.7758329Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7758405Z Traceback (most recent call last):
2025-12-04T10:11:57.7758702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7758766Z     method(*args, **kwargs)
2025-12-04T10:11:57.7759055Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7759115Z     method(*args, **kwargs)
2025-12-04T10:11:57.7759468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7759526Z     with policy():
2025-12-04T10:11:57.7759817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7760015Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7760816Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7760822Z 
2025-12-04T10:11:57.7760953Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7761474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7761478Z 
2025-12-04T10:11:57.7761635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7761758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7761854Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7763911Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7764100Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7764163Z graph_break []
2025-12-04T10:11:57.7764307Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7765021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7765094Z   if out == self.unknown_value:
2025-12-04T10:11:57.7765225Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7765326Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7765520Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7765899Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7765963Z graph_break []
2025-12-04T10:11:57.7766054Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7766350Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7766427Z Traceback (most recent call last):
2025-12-04T10:11:57.7766744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7766812Z     method(*args, **kwargs)
2025-12-04T10:11:57.7767109Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7767175Z     method(*args, **kwargs)
2025-12-04T10:11:57.7767464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7767528Z     with policy():
2025-12-04T10:11:57.7767821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7767887Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7768742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7768747Z 
2025-12-04T10:11:57.7768879Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7769435Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7769440Z 
2025-12-04T10:11:57.7769604Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7769736Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7769830Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7770174Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7770304Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7770362Z graph_break []
2025-12-04T10:11:57.7770484Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7771190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7771324Z   if out == self.unknown_value:
2025-12-04T10:11:57.7771457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7771551Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7771677Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7772024Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7772081Z graph_break []
2025-12-04T10:11:57.7772206Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7772294Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7772452Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7772806Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7772866Z graph_break []
2025-12-04T10:11:57.7773366Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml -
2025-12-04T10:11:57.7773476Z =========================== short test summary info ============================
2025-12-04T10:11:57.7774762Z FAILED [0.4003s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7774773Z 
2025-12-04T10:11:57.7774902Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7775419Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7775423Z 
2025-12-04T10:11:57.7775583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7775721Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7775853Z ================== 1 failed, 57 deselected, 2 rerun in 11.77s ==================
2025-12-04T10:11:57.7775913Z Got exit code 1
2025-12-04T10:11:57.7775977Z Retrying single test...
2025-12-04T10:11:57.7776283Z W1204 09:54:32.096000 54327 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7776673Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml
2025-12-04T10:11:57.7776769Z ============================= test session starts ==============================
2025-12-04T10:11:57.7776984Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7777056Z cachedir: .pytest_cache
2025-12-04T10:11:57.7777362Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7777443Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7777511Z configfile: pytest.ini
2025-12-04T10:11:57.7777826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7777965Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7778579Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7778654Z Running 1 items in this shard
2025-12-04T10:11:57.7778657Z 
2025-12-04T10:11:57.7779386Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:33.144211969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7779390Z 
2025-12-04T10:11:57.7779690Z [W1204 09:54:42.282457396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7779694Z 
2025-12-04T10:11:57.7779983Z [W1204 09:54:42.282706341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7780021Z 
2025-12-04T10:11:57.7780317Z [W1204 09:54:42.289237762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7780321Z 
2025-12-04T10:11:57.7780604Z [W1204 09:54:42.289815112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7780607Z 
2025-12-04T10:11:57.7780896Z [W1204 09:54:42.289979845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7780900Z 
2025-12-04T10:11:57.7781184Z [W1204 09:54:42.295395027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7781187Z 
2025-12-04T10:11:57.7781472Z [W1204 09:54:42.295924426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7781479Z 
2025-12-04T10:11:57.7781768Z [W1204 09:54:42.296088699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7781771Z 
2025-12-04T10:11:57.7781854Z ('RERUN', {'yellow': True}) [10.9860s] [100%]
2025-12-04T10:11:57.7782629Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:43.460241201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7782633Z 
2025-12-04T10:11:57.7782921Z [W1204 09:54:43.460803510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7782925Z 
2025-12-04T10:11:57.7783251Z [W1204 09:54:43.460947613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7783256Z 
2025-12-04T10:11:57.7783542Z [W1204 09:54:43.463862763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7783545Z 
2025-12-04T10:11:57.7783833Z [W1204 09:54:43.464422332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7783836Z 
2025-12-04T10:11:57.7784121Z [W1204 09:54:43.464560035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7784126Z 
2025-12-04T10:11:57.7784410Z [W1204 09:54:43.469028751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7784416Z 
2025-12-04T10:11:57.7784698Z [W1204 09:54:43.469481259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7784705Z 
2025-12-04T10:11:57.7785027Z [W1204 09:54:43.469617101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7785031Z 
2025-12-04T10:11:57.7785123Z ('RERUN', {'yellow': True}) [0.4078s] [100%]
2025-12-04T10:11:57.7785841Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 09:54:43.865114200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7785847Z 
2025-12-04T10:11:57.7786138Z [W1204 09:54:43.865659739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7786142Z 
2025-12-04T10:11:57.7786428Z [W1204 09:54:43.865803881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7786473Z 
2025-12-04T10:11:57.7786760Z [W1204 09:54:43.868666770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7786763Z 
2025-12-04T10:11:57.7787050Z [W1204 09:54:43.869208229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7787053Z 
2025-12-04T10:11:57.7787339Z [W1204 09:54:43.869344891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7787342Z 
2025-12-04T10:11:57.7787629Z [W1204 09:54:43.873815757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7787632Z 
2025-12-04T10:11:57.7787924Z [W1204 09:54:43.874272465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7787932Z 
2025-12-04T10:11:57.7788220Z [W1204 09:54:43.874406147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7788224Z 
2025-12-04T10:11:57.7788283Z FAILED [0.4052s] [100%]
2025-12-04T10:11:57.7788287Z 
2025-12-04T10:11:57.7788373Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7788663Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7788736Z Traceback (most recent call last):
2025-12-04T10:11:57.7789090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7789157Z     method(*args, **kwargs)
2025-12-04T10:11:57.7789453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7789550Z     method(*args, **kwargs)
2025-12-04T10:11:57.7789840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7789907Z     with policy():
2025-12-04T10:11:57.7790201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7790270Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7791062Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7791066Z 
2025-12-04T10:11:57.7791198Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7791720Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7791726Z 
2025-12-04T10:11:57.7791925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7792061Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7792157Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7792504Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7792635Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7792693Z graph_break []
2025-12-04T10:11:57.7792825Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7793524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7793642Z   if out == self.unknown_value:
2025-12-04T10:11:57.7793941Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7794015Z Traceback (most recent call last):
2025-12-04T10:11:57.7794318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7794381Z     method(*args, **kwargs)
2025-12-04T10:11:57.7794669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7794733Z     method(*args, **kwargs)
2025-12-04T10:11:57.7795020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7795082Z     with policy():
2025-12-04T10:11:57.7795379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7795443Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7796244Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7796249Z 
2025-12-04T10:11:57.7796414Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7796932Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7796974Z 
2025-12-04T10:11:57.7797133Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7797260Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7797355Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7797698Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7797827Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7797887Z graph_break []
2025-12-04T10:11:57.7798013Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7798705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7798777Z   if out == self.unknown_value:
2025-12-04T10:11:57.7798900Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7799037Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7799162Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7799507Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7799570Z graph_break []
2025-12-04T10:11:57.7799655Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7799995Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7800068Z Traceback (most recent call last):
2025-12-04T10:11:57.7800370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7800476Z     method(*args, **kwargs)
2025-12-04T10:11:57.7800771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7800835Z     method(*args, **kwargs)
2025-12-04T10:11:57.7801122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7801179Z     with policy():
2025-12-04T10:11:57.7801476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7801540Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7802347Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7802354Z 
2025-12-04T10:11:57.7802478Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7802994Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7803001Z 
2025-12-04T10:11:57.7803156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7803325Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7803425Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7803766Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7803925Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7803987Z graph_break []
2025-12-04T10:11:57.7804110Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7804806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7804873Z   if out == self.unknown_value:
2025-12-04T10:11:57.7804995Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7805087Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7805206Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7805545Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7805606Z graph_break []
2025-12-04T10:11:57.7805726Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7805856Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7805989Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7806329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7806390Z graph_break []
2025-12-04T10:11:57.7806878Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml -
2025-12-04T10:11:57.7806980Z =========================== short test summary info ============================
2025-12-04T10:11:57.7808256Z FAILED [0.4052s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7808315Z 
2025-12-04T10:11:57.7808442Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7808953Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7808957Z 
2025-12-04T10:11:57.7809112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7809216Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7809339Z ================== 1 failed, 57 deselected, 2 rerun in 11.82s ==================
2025-12-04T10:11:57.7809398Z Got exit code 1
2025-12-04T10:11:57.7809871Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.7810111Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7810375Z W1204 09:54:50.517000 54513 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7810794Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml
2025-12-04T10:11:57.7810895Z ============================= test session starts ==============================
2025-12-04T10:11:57.7811135Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7811201Z cachedir: .pytest_cache
2025-12-04T10:11:57.7811515Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7811588Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7811651Z configfile: pytest.ini
2025-12-04T10:11:57.7811968Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7812096Z collecting ... collected 58 items / 36 deselected / 22 selected
2025-12-04T10:11:57.7812187Z stepcurrent: skipping 36 already run items.
2025-12-04T10:11:57.7812257Z Running 22 items in this shard
2025-12-04T10:11:57.7812261Z 
2025-12-04T10:11:57.7812756Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9223s] [  4%]
2025-12-04T10:11:57.7813280Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5383s] [  4%]
2025-12-04T10:11:57.7813720Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5252s] [  4%]
2025-12-04T10:11:57.7813723Z 
2025-12-04T10:11:57.7813815Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7814103Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7814179Z Traceback (most recent call last):
2025-12-04T10:11:57.7814487Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7814553Z     method(*args, **kwargs)
2025-12-04T10:11:57.7814882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7814945Z     method(*args, **kwargs)
2025-12-04T10:11:57.7815231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7815294Z     with policy():
2025-12-04T10:11:57.7815585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7815652Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7816439Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7816446Z 
2025-12-04T10:11:57.7816574Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7817303Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7817308Z 
2025-12-04T10:11:57.7817469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7817601Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7817695Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7818288Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7818423Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7818517Z graph_break []
2025-12-04T10:11:57.7818805Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7818878Z Traceback (most recent call last):
2025-12-04T10:11:57.7819185Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7819254Z     method(*args, **kwargs)
2025-12-04T10:11:57.7819542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7819607Z     method(*args, **kwargs)
2025-12-04T10:11:57.7819895Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7819953Z     with policy():
2025-12-04T10:11:57.7820245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7820313Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7821159Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7821166Z 
2025-12-04T10:11:57.7821291Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7821806Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7821810Z 
2025-12-04T10:11:57.7821965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7822091Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7822230Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7822778Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7822904Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7822966Z graph_break []
2025-12-04T10:11:57.7823091Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7823179Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7823308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7823845Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7823910Z graph_break []
2025-12-04T10:11:57.7823993Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7824279Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7824356Z Traceback (most recent call last):
2025-12-04T10:11:57.7824657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7824721Z     method(*args, **kwargs)
2025-12-04T10:11:57.7825048Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7825110Z     method(*args, **kwargs)
2025-12-04T10:11:57.7825399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7825492Z     with policy():
2025-12-04T10:11:57.7825784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7825853Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7826654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7826658Z 
2025-12-04T10:11:57.7826786Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7827302Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7827307Z 
2025-12-04T10:11:57.7827463Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7827588Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7827716Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7828260Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7828387Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7828446Z graph_break []
2025-12-04T10:11:57.7828574Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7828663Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7828784Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7829320Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7829416Z graph_break []
2025-12-04T10:11:57.7829537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7829622Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7829744Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7830281Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7830338Z graph_break []
2025-12-04T10:11:57.7830831Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml -
2025-12-04T10:11:57.7830944Z =========================== short test summary info ============================
2025-12-04T10:11:57.7832224Z FAILED [0.5252s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7832264Z 
2025-12-04T10:11:57.7832388Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7832904Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7832942Z 
2025-12-04T10:11:57.7833097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7833202Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7833322Z ================== 1 failed, 36 deselected, 2 rerun in 3.01s ===================
2025-12-04T10:11:57.7833378Z Got exit code 1
2025-12-04T10:11:57.7833442Z Retrying single test...
2025-12-04T10:11:57.7833703Z W1204 09:55:00.200000 54695 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7834083Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml
2025-12-04T10:11:57.7834180Z ============================= test session starts ==============================
2025-12-04T10:11:57.7834386Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7834457Z cachedir: .pytest_cache
2025-12-04T10:11:57.7834760Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7834890Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7834956Z configfile: pytest.ini
2025-12-04T10:11:57.7835269Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7835398Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7835967Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7836035Z Running 1 items in this shard
2025-12-04T10:11:57.7836039Z 
2025-12-04T10:11:57.7836768Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:01.781151099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7836807Z 
2025-12-04T10:11:57.7837116Z [W1204 09:55:10.739943601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7837120Z 
2025-12-04T10:11:57.7837412Z [W1204 09:55:10.740238936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7837416Z 
2025-12-04T10:11:57.7837707Z [W1204 09:55:10.746207456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7837711Z 
2025-12-04T10:11:57.7838000Z [W1204 09:55:10.746791646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7838007Z 
2025-12-04T10:11:57.7838294Z [W1204 09:55:10.746951619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7838299Z 
2025-12-04T10:11:57.7838584Z [W1204 09:55:10.752347129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7838591Z 
2025-12-04T10:11:57.7838876Z [W1204 09:55:10.752884938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7838880Z 
2025-12-04T10:11:57.7839203Z [W1204 09:55:10.753058391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7839206Z 
2025-12-04T10:11:57.7839291Z ('RERUN', {'yellow': True}) [10.8960s] [100%]
2025-12-04T10:11:57.7840056Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:11.552367815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7840098Z 
2025-12-04T10:11:57.7840389Z [W1204 09:55:11.552908864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7840392Z 
2025-12-04T10:11:57.7840678Z [W1204 09:55:11.553053426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7840682Z 
2025-12-04T10:11:57.7840971Z [W1204 09:55:11.555906215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7840975Z 
2025-12-04T10:11:57.7841261Z [W1204 09:55:11.556357122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7841267Z 
2025-12-04T10:11:57.7841551Z [W1204 09:55:11.556499185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7841557Z 
2025-12-04T10:11:57.7841882Z [W1204 09:55:11.561060492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7841886Z 
2025-12-04T10:11:57.7842173Z [W1204 09:55:11.561513349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7842176Z 
2025-12-04T10:11:57.7842478Z [W1204 09:55:11.561652312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7842481Z 
2025-12-04T10:11:57.7842560Z ('RERUN', {'yellow': True}) [0.4962s] [100%]
2025-12-04T10:11:57.7843277Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:12.047069967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7843318Z 
2025-12-04T10:11:57.7843605Z [W1204 09:55:12.047599646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7843608Z 
2025-12-04T10:11:57.7843896Z [W1204 09:55:12.047739979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7843899Z 
2025-12-04T10:11:57.7844185Z [W1204 09:55:12.050620347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7844188Z 
2025-12-04T10:11:57.7844476Z [W1204 09:55:12.051069204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7844482Z 
2025-12-04T10:11:57.7844766Z [W1204 09:55:12.051211707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7844769Z 
2025-12-04T10:11:57.7845053Z [W1204 09:55:12.055615111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7845064Z 
2025-12-04T10:11:57.7845355Z [W1204 09:55:12.056061458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7845358Z 
2025-12-04T10:11:57.7845757Z [W1204 09:55:12.056199000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7845760Z 
2025-12-04T10:11:57.7845825Z FAILED [0.4975s] [100%]
2025-12-04T10:11:57.7845828Z 
2025-12-04T10:11:57.7845910Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7846234Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7846307Z Traceback (most recent call last):
2025-12-04T10:11:57.7846616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7846684Z     method(*args, **kwargs)
2025-12-04T10:11:57.7846976Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7847039Z     method(*args, **kwargs)
2025-12-04T10:11:57.7847330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7847391Z     with policy():
2025-12-04T10:11:57.7847684Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7847751Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7848578Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7848586Z 
2025-12-04T10:11:57.7848724Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7849245Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7849249Z 
2025-12-04T10:11:57.7849411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7849537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7849634Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7850213Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7850350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7850416Z graph_break []
2025-12-04T10:11:57.7850541Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7851232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7851305Z   if out == self.unknown_value:
2025-12-04T10:11:57.7851592Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7851668Z Traceback (most recent call last):
2025-12-04T10:11:57.7851966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7852028Z     method(*args, **kwargs)
2025-12-04T10:11:57.7852318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7852381Z     method(*args, **kwargs)
2025-12-04T10:11:57.7852669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7852764Z     with policy():
2025-12-04T10:11:57.7853057Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7853125Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7853922Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7853961Z 
2025-12-04T10:11:57.7854089Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7854601Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7854605Z 
2025-12-04T10:11:57.7854759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7854883Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7854978Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7855521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7855686Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7855744Z graph_break []
2025-12-04T10:11:57.7855868Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7856556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7856629Z   if out == self.unknown_value:
2025-12-04T10:11:57.7856750Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7856839Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7857002Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7857541Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7857600Z graph_break []
2025-12-04T10:11:57.7857683Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7857973Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7858048Z Traceback (most recent call last):
2025-12-04T10:11:57.7858344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7858409Z     method(*args, **kwargs)
2025-12-04T10:11:57.7858701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7858767Z     method(*args, **kwargs)
2025-12-04T10:11:57.7859058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7859115Z     with policy():
2025-12-04T10:11:57.7859403Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7859471Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7860306Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7860310Z 
2025-12-04T10:11:57.7860439Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7861015Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7861021Z 
2025-12-04T10:11:57.7861174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7861301Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7861389Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7861934Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7862057Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7862112Z graph_break []
2025-12-04T10:11:57.7862237Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7862964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7863036Z   if out == self.unknown_value:
2025-12-04T10:11:57.7863157Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7863246Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7863367Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7863902Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7863964Z graph_break []
2025-12-04T10:11:57.7864120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7864206Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7864332Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7864863Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7864926Z graph_break []
2025-12-04T10:11:57.7865427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml -
2025-12-04T10:11:57.7865526Z =========================== short test summary info ============================
2025-12-04T10:11:57.7866801Z FAILED [0.4975s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7866807Z 
2025-12-04T10:11:57.7866931Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7867488Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7867492Z 
2025-12-04T10:11:57.7867648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7867762Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7867916Z ================== 1 failed, 57 deselected, 2 rerun in 11.91s ==================
2025-12-04T10:11:57.7867977Z Got exit code 1
2025-12-04T10:11:57.7868043Z Retrying single test...
2025-12-04T10:11:57.7868313Z W1204 09:55:18.710000 54882 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7868700Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml
2025-12-04T10:11:57.7868798Z ============================= test session starts ==============================
2025-12-04T10:11:57.7869005Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7869072Z cachedir: .pytest_cache
2025-12-04T10:11:57.7869377Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7869464Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7869534Z configfile: pytest.ini
2025-12-04T10:11:57.7869849Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7870017Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7870588Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7870658Z Running 1 items in this shard
2025-12-04T10:11:57.7870662Z 
2025-12-04T10:11:57.7871393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:20.303032215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7871432Z 
2025-12-04T10:11:57.7871728Z [W1204 09:55:29.179541590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7871731Z 
2025-12-04T10:11:57.7872026Z [W1204 09:55:29.179794475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7872029Z 
2025-12-04T10:11:57.7872315Z [W1204 09:55:29.185835228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7872318Z 
2025-12-04T10:11:57.7872612Z [W1204 09:55:29.186432518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7872615Z 
2025-12-04T10:11:57.7872897Z [W1204 09:55:29.186598181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7872901Z 
2025-12-04T10:11:57.7873192Z [W1204 09:55:29.192032814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7873196Z 
2025-12-04T10:11:57.7873481Z [W1204 09:55:29.192590753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7873485Z 
2025-12-04T10:11:57.7873769Z [W1204 09:55:29.192764986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7873772Z 
2025-12-04T10:11:57.7873856Z ('RERUN', {'yellow': True}) [10.8225s] [100%]
2025-12-04T10:11:57.7874604Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:30.992479370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7874641Z 
2025-12-04T10:11:57.7874933Z [W1204 09:55:30.993006509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7874937Z 
2025-12-04T10:11:57.7875225Z [W1204 09:55:30.993145221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7875228Z 
2025-12-04T10:11:57.7875516Z [W1204 09:55:30.995998550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7875520Z 
2025-12-04T10:11:57.7875804Z [W1204 09:55:30.996457008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7875807Z 
2025-12-04T10:11:57.7876101Z [W1204 09:55:30.996594360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7876106Z 
2025-12-04T10:11:57.7876391Z [W1204 09:55:30.001141148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7876396Z 
2025-12-04T10:11:57.7876721Z [W1204 09:55:30.001620056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7876729Z 
2025-12-04T10:11:57.7877013Z [W1204 09:55:30.001758248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7877018Z 
2025-12-04T10:11:57.7877094Z ('RERUN', {'yellow': True}) [0.4968s] [100%]
2025-12-04T10:11:57.7877811Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 09:55:30.487933356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7877816Z 
2025-12-04T10:11:57.7878101Z [W1204 09:55:30.488478596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7878138Z 
2025-12-04T10:11:57.7878429Z [W1204 09:55:30.488619478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7878432Z 
2025-12-04T10:11:57.7878716Z [W1204 09:55:30.491505898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7878719Z 
2025-12-04T10:11:57.7879008Z [W1204 09:55:30.491954685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7879012Z 
2025-12-04T10:11:57.7879298Z [W1204 09:55:30.492141558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7879301Z 
2025-12-04T10:11:57.7879590Z [W1204 09:55:30.496600445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7879595Z 
2025-12-04T10:11:57.7879921Z [W1204 09:55:30.497052333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7879924Z 
2025-12-04T10:11:57.7880210Z [W1204 09:55:30.497187816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7880213Z 
2025-12-04T10:11:57.7880276Z FAILED [0.4970s] [100%]
2025-12-04T10:11:57.7880279Z 
2025-12-04T10:11:57.7880399Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7880689Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7880765Z Traceback (most recent call last):
2025-12-04T10:11:57.7881073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7881178Z     method(*args, **kwargs)
2025-12-04T10:11:57.7881473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7881539Z     method(*args, **kwargs)
2025-12-04T10:11:57.7881827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7881886Z     with policy():
2025-12-04T10:11:57.7882189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7882257Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7883047Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7883058Z 
2025-12-04T10:11:57.7883186Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7883753Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7883757Z 
2025-12-04T10:11:57.7883922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7884049Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7884148Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7884696Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7884862Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7884923Z graph_break []
2025-12-04T10:11:57.7885047Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7885740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7885811Z   if out == self.unknown_value:
2025-12-04T10:11:57.7886100Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7886176Z Traceback (most recent call last):
2025-12-04T10:11:57.7886473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7886537Z     method(*args, **kwargs)
2025-12-04T10:11:57.7886832Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7886895Z     method(*args, **kwargs)
2025-12-04T10:11:57.7887183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7887241Z     with policy():
2025-12-04T10:11:57.7887532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7887600Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7888754Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7888795Z 
2025-12-04T10:11:57.7888933Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7889454Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7889457Z 
2025-12-04T10:11:57.7889616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7889741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7889835Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7890388Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7890515Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7890576Z graph_break []
2025-12-04T10:11:57.7890701Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7891427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7891501Z   if out == self.unknown_value:
2025-12-04T10:11:57.7891622Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7891712Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7891839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7892378Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7892475Z graph_break []
2025-12-04T10:11:57.7892561Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7892848Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.7892926Z Traceback (most recent call last):
2025-12-04T10:11:57.7893222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7893288Z     method(*args, **kwargs)
2025-12-04T10:11:57.7893576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7893641Z     method(*args, **kwargs)
2025-12-04T10:11:57.7893929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7893989Z     with policy():
2025-12-04T10:11:57.7894280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7894349Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7895153Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7895157Z 
2025-12-04T10:11:57.7895321Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7895835Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7895871Z 
2025-12-04T10:11:57.7896028Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7896154Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7896243Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7896786Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7896910Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7896967Z graph_break []
2025-12-04T10:11:57.7897093Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7897778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7897854Z   if out == self.unknown_value:
2025-12-04T10:11:57.7897974Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7898099Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7898224Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7898760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7898822Z graph_break []
2025-12-04T10:11:57.7898944Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7899030Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7899152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7899725Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7899786Z graph_break []
2025-12-04T10:11:57.7900275Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml -
2025-12-04T10:11:57.7900375Z =========================== short test summary info ============================
2025-12-04T10:11:57.7901665Z FAILED [0.4970s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7901673Z 
2025-12-04T10:11:57.7901801Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7902322Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7902326Z 
2025-12-04T10:11:57.7902482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7902622Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7902739Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ==================
2025-12-04T10:11:57.7902798Z Got exit code 1
2025-12-04T10:11:57.7903269Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.7903546Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7903811Z W1204 09:55:37.147000 55069 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7904193Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml
2025-12-04T10:11:57.7904287Z ============================= test session starts ==============================
2025-12-04T10:11:57.7904497Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7904563Z cachedir: .pytest_cache
2025-12-04T10:11:57.7904869Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7904949Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7905014Z configfile: pytest.ini
2025-12-04T10:11:57.7905367Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7905497Z collecting ... collected 58 items / 37 deselected / 21 selected
2025-12-04T10:11:57.7905584Z stepcurrent: skipping 37 already run items.
2025-12-04T10:11:57.7905656Z Running 21 items in this shard
2025-12-04T10:11:57.7905660Z 
2025-12-04T10:11:57.7906161Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8839s] [  4%]
2025-12-04T10:11:57.7906652Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4860s] [  4%]
2025-12-04T10:11:57.7907105Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4938s] [  4%]
2025-12-04T10:11:57.7907144Z 
2025-12-04T10:11:57.7907232Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7907525Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7907598Z Traceback (most recent call last):
2025-12-04T10:11:57.7907913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7907978Z     method(*args, **kwargs)
2025-12-04T10:11:57.7908270Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7908336Z     method(*args, **kwargs)
2025-12-04T10:11:57.7908626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7908692Z     with policy():
2025-12-04T10:11:57.7908989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7909053Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7909857Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7909895Z 
2025-12-04T10:11:57.7910023Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7910543Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7910582Z 
2025-12-04T10:11:57.7910741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7910869Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7910966Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7911316Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7911449Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7911509Z graph_break []
2025-12-04T10:11:57.7911799Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7911878Z Traceback (most recent call last):
2025-12-04T10:11:57.7912169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7912238Z     method(*args, **kwargs)
2025-12-04T10:11:57.7912566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7912628Z     method(*args, **kwargs)
2025-12-04T10:11:57.7912918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7912975Z     with policy():
2025-12-04T10:11:57.7913268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7913347Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7914167Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7914225Z 
2025-12-04T10:11:57.7914356Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7914875Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7914880Z 
2025-12-04T10:11:57.7915036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7915163Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7915257Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7915603Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7915730Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7915789Z graph_break []
2025-12-04T10:11:57.7915913Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7916004Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7916125Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7916464Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7916523Z graph_break []
2025-12-04T10:11:57.7916645Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7916934Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7917155Z Traceback (most recent call last):
2025-12-04T10:11:57.7917465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7917603Z     method(*args, **kwargs)
2025-12-04T10:11:57.7917896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7917959Z     method(*args, **kwargs)
2025-12-04T10:11:57.7918248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7918309Z     with policy():
2025-12-04T10:11:57.7918599Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7918670Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7919483Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7919489Z 
2025-12-04T10:11:57.7919611Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7920218Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7920222Z 
2025-12-04T10:11:57.7920380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7920506Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7920598Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7920937Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7921064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7921186Z graph_break []
2025-12-04T10:11:57.7921314Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7921404Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7921523Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7921866Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7921922Z graph_break []
2025-12-04T10:11:57.7922047Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7922138Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7922256Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7922593Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7922653Z graph_break []
2025-12-04T10:11:57.7923139Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml -
2025-12-04T10:11:57.7923239Z =========================== short test summary info ============================
2025-12-04T10:11:57.7924589Z FAILED [0.4938s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7924629Z 
2025-12-04T10:11:57.7924760Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7925281Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7925284Z 
2025-12-04T10:11:57.7925441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7925545Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7925659Z ================== 1 failed, 37 deselected, 2 rerun in 2.89s ===================
2025-12-04T10:11:57.7925721Z Got exit code 1
2025-12-04T10:11:57.7925786Z Retrying single test...
2025-12-04T10:11:57.7926050Z W1204 09:55:46.856000 55257 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7926436Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml
2025-12-04T10:11:57.7926532Z ============================= test session starts ==============================
2025-12-04T10:11:57.7926774Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7926840Z cachedir: .pytest_cache
2025-12-04T10:11:57.7927144Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7927222Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7927287Z configfile: pytest.ini
2025-12-04T10:11:57.7927602Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7927741Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7928313Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7928424Z Running 1 items in this shard
2025-12-04T10:11:57.7928427Z 
2025-12-04T10:11:57.7929160Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:55:47.936291905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7929165Z 
2025-12-04T10:11:57.7929465Z [W1204 09:55:56.843829001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7929469Z 
2025-12-04T10:11:57.7929758Z [W1204 09:55:56.844080485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7929763Z 
2025-12-04T10:11:57.7930054Z [W1204 09:55:56.849979076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7930058Z 
2025-12-04T10:11:57.7930347Z [W1204 09:55:56.850606637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7930350Z 
2025-12-04T10:11:57.7930637Z [W1204 09:55:56.850781189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7930641Z 
2025-12-04T10:11:57.7930962Z [W1204 09:55:56.856241443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7930966Z 
2025-12-04T10:11:57.7931256Z [W1204 09:55:56.856771073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7931264Z 
2025-12-04T10:11:57.7931584Z [W1204 09:55:56.856936635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7931589Z 
2025-12-04T10:11:57.7931671Z ('RERUN', {'yellow': True}) [10.7931s] [100%]
2025-12-04T10:11:57.7932399Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:55:58.069861275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7932403Z 
2025-12-04T10:11:57.7932696Z [W1204 09:55:58.070407264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7932699Z 
2025-12-04T10:11:57.7932989Z [W1204 09:55:58.070560197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7932992Z 
2025-12-04T10:11:57.7933277Z [W1204 09:55:58.073418656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7933282Z 
2025-12-04T10:11:57.7933603Z [W1204 09:55:58.073963395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7933607Z 
2025-12-04T10:11:57.7933892Z [W1204 09:55:58.074106357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7933895Z 
2025-12-04T10:11:57.7934185Z [W1204 09:55:58.078500032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7934189Z 
2025-12-04T10:11:57.7934475Z [W1204 09:55:58.078958000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7934479Z 
2025-12-04T10:11:57.7934765Z [W1204 09:55:58.079096672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7934805Z 
2025-12-04T10:11:57.7934886Z ('RERUN', {'yellow': True}) [0.4533s] [100%]
2025-12-04T10:11:57.7935610Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:55:58.520639124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7935615Z 
2025-12-04T10:11:57.7935908Z [W1204 09:55:58.521160804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7935911Z 
2025-12-04T10:11:57.7936208Z [W1204 09:55:58.521301296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7936211Z 
2025-12-04T10:11:57.7936502Z [W1204 09:55:58.524125595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7936506Z 
2025-12-04T10:11:57.7936793Z [W1204 09:55:58.524674664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7936797Z 
2025-12-04T10:11:57.7937086Z [W1204 09:55:58.524813387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7937089Z 
2025-12-04T10:11:57.7937374Z [W1204 09:55:58.529259023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7937412Z 
2025-12-04T10:11:57.7937697Z [W1204 09:55:58.529724541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7937702Z 
2025-12-04T10:11:57.7937985Z [W1204 09:55:58.529860963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7938024Z 
2025-12-04T10:11:57.7938086Z FAILED [0.4517s] [100%]
2025-12-04T10:11:57.7938090Z 
2025-12-04T10:11:57.7938177Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7938472Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7938548Z Traceback (most recent call last):
2025-12-04T10:11:57.7938855Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7938921Z     method(*args, **kwargs)
2025-12-04T10:11:57.7939219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7939283Z     method(*args, **kwargs)
2025-12-04T10:11:57.7939569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7939634Z     with policy():
2025-12-04T10:11:57.7939983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7940054Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7940857Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7940864Z 
2025-12-04T10:11:57.7940993Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7941518Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7941556Z 
2025-12-04T10:11:57.7941715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7941844Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7941936Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7942286Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7942421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7942478Z graph_break []
2025-12-04T10:11:57.7942604Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7943296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7943368Z   if out == self.unknown_value:
2025-12-04T10:11:57.7943659Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7943732Z Traceback (most recent call last):
2025-12-04T10:11:57.7944029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7944093Z     method(*args, **kwargs)
2025-12-04T10:11:57.7944384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7944486Z     method(*args, **kwargs)
2025-12-04T10:11:57.7944775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7944834Z     with policy():
2025-12-04T10:11:57.7945163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7945228Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7946044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7946048Z 
2025-12-04T10:11:57.7946173Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7946691Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7946694Z 
2025-12-04T10:11:57.7946850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7946977Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7947071Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7947450Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7947580Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7947636Z graph_break []
2025-12-04T10:11:57.7947759Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7948463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7948532Z   if out == self.unknown_value:
2025-12-04T10:11:57.7948656Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7948787Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7948910Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7949256Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7949313Z graph_break []
2025-12-04T10:11:57.7949394Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7949686Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7949756Z Traceback (most recent call last):
2025-12-04T10:11:57.7950066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7950132Z     method(*args, **kwargs)
2025-12-04T10:11:57.7950425Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7950491Z     method(*args, **kwargs)
2025-12-04T10:11:57.7950779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7950838Z     with policy():
2025-12-04T10:11:57.7951137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7951201Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7952053Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7952090Z 
2025-12-04T10:11:57.7952215Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7952741Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7952745Z 
2025-12-04T10:11:57.7952903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7953025Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7953114Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7953462Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7953586Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7953643Z graph_break []
2025-12-04T10:11:57.7953767Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7954487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7954556Z   if out == self.unknown_value:
2025-12-04T10:11:57.7954678Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7954773Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7954895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7955241Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7955299Z graph_break []
2025-12-04T10:11:57.7955421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7955630Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7955752Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7956091Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7956152Z graph_break []
2025-12-04T10:11:57.7956640Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml -
2025-12-04T10:11:57.7956747Z =========================== short test summary info ============================
2025-12-04T10:11:57.7958036Z FAILED [0.4517s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7958043Z 
2025-12-04T10:11:57.7958180Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7958701Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7958705Z 
2025-12-04T10:11:57.7958903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7959008Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7959124Z ================== 1 failed, 57 deselected, 2 rerun in 11.72s ==================
2025-12-04T10:11:57.7959222Z Got exit code 1
2025-12-04T10:11:57.7959286Z Retrying single test...
2025-12-04T10:11:57.7959550Z W1204 09:56:05.196000 55450 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7959976Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml
2025-12-04T10:11:57.7960071Z ============================= test session starts ==============================
2025-12-04T10:11:57.7960281Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7960347Z cachedir: .pytest_cache
2025-12-04T10:11:57.7960650Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7960729Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7960793Z configfile: pytest.ini
2025-12-04T10:11:57.7961107Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7961242Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.7961853Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7961928Z Running 1 items in this shard
2025-12-04T10:11:57.7961933Z 
2025-12-04T10:11:57.7962664Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:56:06.293926580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7962667Z 
2025-12-04T10:11:57.7962970Z [W1204 09:56:15.296020267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7963009Z 
2025-12-04T10:11:57.7963303Z [W1204 09:56:15.296269911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7963307Z 
2025-12-04T10:11:57.7963598Z [W1204 09:56:15.302126701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7963602Z 
2025-12-04T10:11:57.7963890Z [W1204 09:56:15.302717041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7963894Z 
2025-12-04T10:11:57.7964181Z [W1204 09:56:15.302888004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7964185Z 
2025-12-04T10:11:57.7964478Z [W1204 09:56:15.308270636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7964484Z 
2025-12-04T10:11:57.7964781Z [W1204 09:56:15.308791585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7964786Z 
2025-12-04T10:11:57.7965077Z [W1204 09:56:15.308951698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7965080Z 
2025-12-04T10:11:57.7965161Z ('RERUN', {'yellow': True}) [10.9060s] [100%]
2025-12-04T10:11:57.7965921Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:56:16.524140008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7965926Z 
2025-12-04T10:11:57.7966215Z [W1204 09:56:16.524681957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7966269Z 
2025-12-04T10:11:57.7966560Z [W1204 09:56:16.524819159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7966565Z 
2025-12-04T10:11:57.7966849Z [W1204 09:56:16.527809150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7966853Z 
2025-12-04T10:11:57.7967137Z [W1204 09:56:16.528385280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7967145Z 
2025-12-04T10:11:57.7967430Z [W1204 09:56:16.528523333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7967433Z 
2025-12-04T10:11:57.7967718Z [W1204 09:56:16.533187112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7967724Z 
2025-12-04T10:11:57.7968012Z [W1204 09:56:16.533681681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7968050Z 
2025-12-04T10:11:57.7968335Z [W1204 09:56:16.533824683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7968339Z 
2025-12-04T10:11:57.7968418Z ('RERUN', {'yellow': True}) [0.4553s] [100%]
2025-12-04T10:11:57.7969143Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 09:56:17.976013959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7969146Z 
2025-12-04T10:11:57.7969435Z [W1204 09:56:17.976560149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7969473Z 
2025-12-04T10:11:57.7969758Z [W1204 09:56:17.976706161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7969763Z 
2025-12-04T10:11:57.7970051Z [W1204 09:56:17.979665782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7970054Z 
2025-12-04T10:11:57.7970339Z [W1204 09:56:17.980253322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7970342Z 
2025-12-04T10:11:57.7970632Z [W1204 09:56:17.980405175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7970639Z 
2025-12-04T10:11:57.7970924Z [W1204 09:56:17.985030734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7970930Z 
2025-12-04T10:11:57.7971216Z [W1204 09:56:17.985495142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7971219Z 
2025-12-04T10:11:57.7971510Z [W1204 09:56:17.985632494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.7971513Z 
2025-12-04T10:11:57.7971573Z FAILED [0.4493s] [100%]
2025-12-04T10:11:57.7971576Z 
2025-12-04T10:11:57.7971661Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7972001Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7972079Z Traceback (most recent call last):
2025-12-04T10:11:57.7972393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7972495Z     method(*args, **kwargs)
2025-12-04T10:11:57.7972793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7972857Z     method(*args, **kwargs)
2025-12-04T10:11:57.7973151Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7973214Z     with policy():
2025-12-04T10:11:57.7973507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7973569Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7974375Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7974383Z 
2025-12-04T10:11:57.7974511Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7975070Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7975074Z 
2025-12-04T10:11:57.7975231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7975361Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7975452Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7975801Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7975930Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7975990Z graph_break []
2025-12-04T10:11:57.7976149Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7976864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7976934Z   if out == self.unknown_value:
2025-12-04T10:11:57.7977224Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7977296Z Traceback (most recent call last):
2025-12-04T10:11:57.7977591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7977660Z     method(*args, **kwargs)
2025-12-04T10:11:57.7977951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7978022Z     method(*args, **kwargs)
2025-12-04T10:11:57.7978308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7978367Z     with policy():
2025-12-04T10:11:57.7978660Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7978725Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7979582Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.7979587Z 
2025-12-04T10:11:57.7979714Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7980267Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7980275Z 
2025-12-04T10:11:57.7980434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7980558Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7980652Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7980997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7981125Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7981189Z graph_break []
2025-12-04T10:11:57.7981310Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7982010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7982121Z   if out == self.unknown_value:
2025-12-04T10:11:57.7982249Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7982343Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7982468Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7982811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7982872Z graph_break []
2025-12-04T10:11:57.7982954Z =================================== FAILURES ===================================
2025-12-04T10:11:57.7983250Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.7983357Z Traceback (most recent call last):
2025-12-04T10:11:57.7983656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7983721Z     method(*args, **kwargs)
2025-12-04T10:11:57.7984011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7984087Z     method(*args, **kwargs)
2025-12-04T10:11:57.7984378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7984439Z     with policy():
2025-12-04T10:11:57.7984734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7984798Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7985608Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7985618Z 
2025-12-04T10:11:57.7985743Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7986260Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7986263Z 
2025-12-04T10:11:57.7986457Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7986583Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7986678Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7987056Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7987179Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7987240Z graph_break []
2025-12-04T10:11:57.7987363Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.7988058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.7988129Z   if out == self.unknown_value:
2025-12-04T10:11:57.7988253Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7988345Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7988468Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7988813Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7988910Z graph_break []
2025-12-04T10:11:57.7989033Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.7989121Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.7989243Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.7989583Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.7989645Z graph_break []
2025-12-04T10:11:57.7990129Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml -
2025-12-04T10:11:57.7990266Z =========================== short test summary info ============================
2025-12-04T10:11:57.7991553Z FAILED [0.4493s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.7991558Z 
2025-12-04T10:11:57.7991689Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.7992207Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7992212Z 
2025-12-04T10:11:57.7992366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.7992473Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.7992591Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ==================
2025-12-04T10:11:57.7992653Z Got exit code 1
2025-12-04T10:11:57.7993127Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.7993367Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.7993688Z W1204 09:56:23.601000 55643 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.7994079Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml
2025-12-04T10:11:57.7994212Z ============================= test session starts ==============================
2025-12-04T10:11:57.7994420Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.7994486Z cachedir: .pytest_cache
2025-12-04T10:11:57.7994793Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.7994866Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.7994931Z configfile: pytest.ini
2025-12-04T10:11:57.7995246Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.7995372Z collecting ... collected 58 items / 38 deselected / 20 selected
2025-12-04T10:11:57.7995459Z stepcurrent: skipping 38 already run items.
2025-12-04T10:11:57.7995526Z Running 20 items in this shard
2025-12-04T10:11:57.7995531Z 
2025-12-04T10:11:57.7996024Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8709s] [  5%]
2025-12-04T10:11:57.7996550Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4537s] [  5%]
2025-12-04T10:11:57.7996992Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4477s] [  5%]
2025-12-04T10:11:57.7996995Z 
2025-12-04T10:11:57.7997084Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.7997370Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.7997444Z Traceback (most recent call last):
2025-12-04T10:11:57.7997801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7997866Z     method(*args, **kwargs)
2025-12-04T10:11:57.7998162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.7998224Z     method(*args, **kwargs)
2025-12-04T10:11:57.7998512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.7998574Z     with policy():
2025-12-04T10:11:57.7998869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.7998934Z     raise RuntimeError(msg)
2025-12-04T10:11:57.7999724Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.7999731Z 
2025-12-04T10:11:57.7999860Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8000417Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8000420Z 
2025-12-04T10:11:57.8000580Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8000748Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8000843Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8001190Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8001355Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8001416Z graph_break []
2025-12-04T10:11:57.8001706Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8001780Z Traceback (most recent call last):
2025-12-04T10:11:57.8002081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8002146Z     method(*args, **kwargs)
2025-12-04T10:11:57.8002438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8002499Z     method(*args, **kwargs)
2025-12-04T10:11:57.8002807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8002868Z     with policy():
2025-12-04T10:11:57.8003168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8003234Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8004066Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8004071Z 
2025-12-04T10:11:57.8004200Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8004716Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8004720Z 
2025-12-04T10:11:57.8004880Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8005042Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8005133Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8005487Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8005609Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8005669Z graph_break []
2025-12-04T10:11:57.8005794Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8005881Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8006006Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8006342Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8006405Z graph_break []
2025-12-04T10:11:57.8006487Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8006772Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8006848Z Traceback (most recent call last):
2025-12-04T10:11:57.8007146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8007208Z     method(*args, **kwargs)
2025-12-04T10:11:57.8007537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8007601Z     method(*args, **kwargs)
2025-12-04T10:11:57.8007892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8007984Z     with policy():
2025-12-04T10:11:57.8008277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8008345Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8009154Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8009158Z 
2025-12-04T10:11:57.8009284Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8009797Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8009801Z 
2025-12-04T10:11:57.8009967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8010100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8010190Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8010572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8010696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8010753Z graph_break []
2025-12-04T10:11:57.8010879Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8010967Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8011087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8011430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8011527Z graph_break []
2025-12-04T10:11:57.8011651Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8011740Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8011860Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8012199Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8012255Z graph_break []
2025-12-04T10:11:57.8012742Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml -
2025-12-04T10:11:57.8012843Z =========================== short test summary info ============================
2025-12-04T10:11:57.8014116Z FAILED [0.4477s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8014126Z 
2025-12-04T10:11:57.8014249Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8014802Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8014806Z 
2025-12-04T10:11:57.8014963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8015067Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8015220Z ================== 1 failed, 38 deselected, 2 rerun in 2.80s ===================
2025-12-04T10:11:57.8015278Z Got exit code 1
2025-12-04T10:11:57.8015342Z Retrying single test...
2025-12-04T10:11:57.8015610Z W1204 09:56:33.263000 55824 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8015992Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml
2025-12-04T10:11:57.8016089Z ============================= test session starts ==============================
2025-12-04T10:11:57.8016299Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8016365Z cachedir: .pytest_cache
2025-12-04T10:11:57.8016675Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8016754Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8016818Z configfile: pytest.ini
2025-12-04T10:11:57.8017345Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8017486Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8018067Z stepcurrent: skipping 38 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8018144Z Running 1 items in this shard
2025-12-04T10:11:57.8018148Z 
2025-12-04T10:11:57.8018882Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:34.294440433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8018958Z 
2025-12-04T10:11:57.8019257Z [W1204 09:56:43.626429096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8019261Z 
2025-12-04T10:11:57.8019552Z [W1204 09:56:43.626685660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8019555Z 
2025-12-04T10:11:57.8019847Z [W1204 09:56:43.632544417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8019851Z 
2025-12-04T10:11:57.8020136Z [W1204 09:56:43.633130987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8020139Z 
2025-12-04T10:11:57.8020427Z [W1204 09:56:43.633298570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8020434Z 
2025-12-04T10:11:57.8020717Z [W1204 09:56:43.638681728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8020721Z 
2025-12-04T10:11:57.8021011Z [W1204 09:56:43.639211068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8021015Z 
2025-12-04T10:11:57.8021298Z [W1204 09:56:43.639374481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8021302Z 
2025-12-04T10:11:57.8021384Z ('RERUN', {'yellow': True}) [11.1662s] [100%]
2025-12-04T10:11:57.8022161Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:44.805550478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8022209Z 
2025-12-04T10:11:57.8022500Z [W1204 09:56:44.806129398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8022503Z 
2025-12-04T10:11:57.8022794Z [W1204 09:56:44.806278641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8022797Z 
2025-12-04T10:11:57.8023079Z [W1204 09:56:44.809226545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8023082Z 
2025-12-04T10:11:57.8023376Z [W1204 09:56:44.809791685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8023379Z 
2025-12-04T10:11:57.8023668Z [W1204 09:56:44.809933568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8023673Z 
2025-12-04T10:11:57.8023963Z [W1204 09:56:44.814477549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8023966Z 
2025-12-04T10:11:57.8024286Z [W1204 09:56:44.814940148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8024290Z 
2025-12-04T10:11:57.8024581Z [W1204 09:56:44.815077890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8024584Z 
2025-12-04T10:11:57.8024663Z ('RERUN', {'yellow': True}) [0.4097s] [100%]
2025-12-04T10:11:57.8025378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:45.213234814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8025386Z 
2025-12-04T10:11:57.8025706Z [W1204 09:56:45.213794894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8025709Z 
2025-12-04T10:11:57.8025998Z [W1204 09:56:45.213932307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8026001Z 
2025-12-04T10:11:57.8026290Z [W1204 09:56:45.216755347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8026293Z 
2025-12-04T10:11:57.8026580Z [W1204 09:56:45.217294347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8026583Z 
2025-12-04T10:11:57.8026875Z [W1204 09:56:45.217431960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8026879Z 
2025-12-04T10:11:57.8027164Z [W1204 09:56:45.221827179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8027168Z 
2025-12-04T10:11:57.8027465Z [W1204 09:56:45.222280838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8027468Z 
2025-12-04T10:11:57.8027753Z [W1204 09:56:45.222417630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8027757Z 
2025-12-04T10:11:57.8027818Z FAILED [0.4054s] [100%]
2025-12-04T10:11:57.8027826Z 
2025-12-04T10:11:57.8027942Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8028243Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8028320Z Traceback (most recent call last):
2025-12-04T10:11:57.8028670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8028740Z     method(*args, **kwargs)
2025-12-04T10:11:57.8029041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8029103Z     method(*args, **kwargs)
2025-12-04T10:11:57.8029395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8029452Z     with policy():
2025-12-04T10:11:57.8029747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8029816Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8030605Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8030612Z 
2025-12-04T10:11:57.8030743Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8031292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8031297Z 
2025-12-04T10:11:57.8031455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8031585Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8031679Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8032030Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8032157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8032249Z graph_break []
2025-12-04T10:11:57.8032379Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8033075Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8033149Z   if out == self.unknown_value:
2025-12-04T10:11:57.8033437Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8033511Z Traceback (most recent call last):
2025-12-04T10:11:57.8033809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8033871Z     method(*args, **kwargs)
2025-12-04T10:11:57.8034164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8034232Z     method(*args, **kwargs)
2025-12-04T10:11:57.8034521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8034593Z     with policy():
2025-12-04T10:11:57.8034890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8034955Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8035795Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8035799Z 
2025-12-04T10:11:57.8035960Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8036482Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8036486Z 
2025-12-04T10:11:57.8036643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8036769Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8036862Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8037207Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8037335Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8037394Z graph_break []
2025-12-04T10:11:57.8037519Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8038257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8038327Z   if out == self.unknown_value:
2025-12-04T10:11:57.8038453Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8038542Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8038662Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8039005Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8039063Z graph_break []
2025-12-04T10:11:57.8039149Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8039474Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8039546Z Traceback (most recent call last):
2025-12-04T10:11:57.8039845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8039950Z     method(*args, **kwargs)
2025-12-04T10:11:57.8040242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8040311Z     method(*args, **kwargs)
2025-12-04T10:11:57.8040602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8040663Z     with policy():
2025-12-04T10:11:57.8040955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8041023Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8041835Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8041839Z 
2025-12-04T10:11:57.8041962Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8042666Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8042674Z 
2025-12-04T10:11:57.8042871Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8043007Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8043150Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8043499Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8043644Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8043705Z graph_break []
2025-12-04T10:11:57.8043833Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8044549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8044619Z   if out == self.unknown_value:
2025-12-04T10:11:57.8044744Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8044838Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8044963Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8045367Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8045430Z graph_break []
2025-12-04T10:11:57.8045553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8045644Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8045767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8046118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8046180Z graph_break []
2025-12-04T10:11:57.8046675Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml -
2025-12-04T10:11:57.8050866Z =========================== short test summary info ============================
2025-12-04T10:11:57.8052183Z FAILED [0.4054s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8052192Z 
2025-12-04T10:11:57.8052330Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8052855Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8052863Z 
2025-12-04T10:11:57.8053036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8053151Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8053275Z ================== 1 failed, 57 deselected, 2 rerun in 12.00s ==================
2025-12-04T10:11:57.8053335Z Got exit code 1
2025-12-04T10:11:57.8053401Z Retrying single test...
2025-12-04T10:11:57.8053672Z W1204 09:56:51.892000 56010 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8054104Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml
2025-12-04T10:11:57.8054207Z ============================= test session starts ==============================
2025-12-04T10:11:57.8054420Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8054526Z cachedir: .pytest_cache
2025-12-04T10:11:57.8054836Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8054916Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8054982Z configfile: pytest.ini
2025-12-04T10:11:57.8055305Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8055438Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8056013Z stepcurrent: skipping 38 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8056098Z Running 1 items in this shard
2025-12-04T10:11:57.8056102Z 
2025-12-04T10:11:57.8056874Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:56:52.939973852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8056881Z 
2025-12-04T10:11:57.8057186Z [W1204 09:57:02.114876196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8057189Z 
2025-12-04T10:11:57.8057478Z [W1204 09:57:02.115112780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8057482Z 
2025-12-04T10:11:57.8057794Z [W1204 09:57:02.120577807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8057797Z 
2025-12-04T10:11:57.8058096Z [W1204 09:57:02.121118527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8058135Z 
2025-12-04T10:11:57.8058430Z [W1204 09:57:02.121299610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8058433Z 
2025-12-04T10:11:57.8058721Z [W1204 09:57:02.126642794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8058724Z 
2025-12-04T10:11:57.8059014Z [W1204 09:57:02.127147263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8059018Z 
2025-12-04T10:11:57.8059304Z [W1204 09:57:02.127307176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8059307Z 
2025-12-04T10:11:57.8059391Z ('RERUN', {'yellow': True}) [11.0215s] [100%]
2025-12-04T10:11:57.8060129Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:57:03.292542369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8060134Z 
2025-12-04T10:11:57.8060424Z [W1204 09:57:03.293125680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8060427Z 
2025-12-04T10:11:57.8060717Z [W1204 09:57:03.293274052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8060721Z 
2025-12-04T10:11:57.8061043Z [W1204 09:57:03.296197904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8061047Z 
2025-12-04T10:11:57.8061338Z [W1204 09:57:03.296777414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8061376Z 
2025-12-04T10:11:57.8061662Z [W1204 09:57:03.296917897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8061666Z 
2025-12-04T10:11:57.8061954Z [W1204 09:57:03.301421836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8061959Z 
2025-12-04T10:11:57.8062244Z [W1204 09:57:03.301890865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8062247Z 
2025-12-04T10:11:57.8062538Z [W1204 09:57:03.302034497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8062541Z 
2025-12-04T10:11:57.8062621Z ('RERUN', {'yellow': True}) [0.4095s] [100%]
2025-12-04T10:11:57.8063340Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 09:57:03.699437681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8063380Z 
2025-12-04T10:11:57.8063672Z [W1204 09:57:03.700021791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8063675Z 
2025-12-04T10:11:57.8063961Z [W1204 09:57:03.700176804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8063965Z 
2025-12-04T10:11:57.8064254Z [W1204 09:57:03.703069325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8064257Z 
2025-12-04T10:11:57.8064546Z [W1204 09:57:03.703627715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8064584Z 
2025-12-04T10:11:57.8064881Z [W1204 09:57:03.703767297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8064886Z 
2025-12-04T10:11:57.8065175Z [W1204 09:57:03.708216256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8065178Z 
2025-12-04T10:11:57.8065467Z [W1204 09:57:03.708683494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8065470Z 
2025-12-04T10:11:57.8065756Z [W1204 09:57:03.708824177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8065759Z 
2025-12-04T10:11:57.8065822Z FAILED [0.4080s] [100%]
2025-12-04T10:11:57.8065825Z 
2025-12-04T10:11:57.8065914Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8066210Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8066290Z Traceback (most recent call last):
2025-12-04T10:11:57.8066605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8066670Z     method(*args, **kwargs)
2025-12-04T10:11:57.8066969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8067033Z     method(*args, **kwargs)
2025-12-04T10:11:57.8067360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8067422Z     with policy():
2025-12-04T10:11:57.8067718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8067823Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8068622Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8068627Z 
2025-12-04T10:11:57.8068764Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8069281Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8069285Z 
2025-12-04T10:11:57.8069445Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8069578Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8069677Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8070032Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8070270Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8070330Z graph_break []
2025-12-04T10:11:57.8070459Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8071161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8071235Z   if out == self.unknown_value:
2025-12-04T10:11:57.8071525Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8071635Z Traceback (most recent call last):
2025-12-04T10:11:57.8071942Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8072006Z     method(*args, **kwargs)
2025-12-04T10:11:57.8072297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8072363Z     method(*args, **kwargs)
2025-12-04T10:11:57.8072651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8072711Z     with policy():
2025-12-04T10:11:57.8073006Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8073071Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8073879Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8073887Z 
2025-12-04T10:11:57.8074017Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8074534Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8074538Z 
2025-12-04T10:11:57.8074694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8074856Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8074954Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8075303Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8075492Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8075551Z graph_break []
2025-12-04T10:11:57.8075674Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8076368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8076436Z   if out == self.unknown_value:
2025-12-04T10:11:57.8076570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8076661Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8076783Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8077135Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8077195Z graph_break []
2025-12-04T10:11:57.8077313Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8077605Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8077678Z Traceback (most recent call last):
2025-12-04T10:11:57.8077981Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8078043Z     method(*args, **kwargs)
2025-12-04T10:11:57.8078334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8078398Z     method(*args, **kwargs)
2025-12-04T10:11:57.8078695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8078795Z     with policy():
2025-12-04T10:11:57.8079090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8079155Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8080062Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8080067Z 
2025-12-04T10:11:57.8080197Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8080717Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8080724Z 
2025-12-04T10:11:57.8080887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8081014Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8081112Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8081456Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8081584Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8081644Z graph_break []
2025-12-04T10:11:57.8081816Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8082512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8082616Z   if out == self.unknown_value:
2025-12-04T10:11:57.8082740Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8082830Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8082950Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8083295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8083353Z graph_break []
2025-12-04T10:11:57.8083475Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8083561Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8083682Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8084022Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8084084Z graph_break []
2025-12-04T10:11:57.8084604Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml -
2025-12-04T10:11:57.8084710Z =========================== short test summary info ============================
2025-12-04T10:11:57.8085986Z FAILED [0.4080s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8086025Z 
2025-12-04T10:11:57.8086152Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8086664Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8086667Z 
2025-12-04T10:11:57.8086824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8086926Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8087041Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ==================
2025-12-04T10:11:57.8087103Z Got exit code 1
2025-12-04T10:11:57.8087569Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8087817Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8088086Z W1204 09:57:10.291000 56196 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8088479Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml
2025-12-04T10:11:57.8088587Z ============================= test session starts ==============================
2025-12-04T10:11:57.8088794Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8088860Z cachedir: .pytest_cache
2025-12-04T10:11:57.8089204Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8089282Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8089349Z configfile: pytest.ini
2025-12-04T10:11:57.8089699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8089827Z collecting ... collected 58 items / 39 deselected / 19 selected
2025-12-04T10:11:57.8089920Z stepcurrent: skipping 39 already run items.
2025-12-04T10:11:57.8089990Z Running 19 items in this shard
2025-12-04T10:11:57.8089994Z 
2025-12-04T10:11:57.8090486Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9148s] [  5%]
2025-12-04T10:11:57.8090972Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5328s] [  5%]
2025-12-04T10:11:57.8091405Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5290s] [  5%]
2025-12-04T10:11:57.8091413Z 
2025-12-04T10:11:57.8091494Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8091817Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8091895Z Traceback (most recent call last):
2025-12-04T10:11:57.8092204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8092269Z     method(*args, **kwargs)
2025-12-04T10:11:57.8092564Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8092624Z     method(*args, **kwargs)
2025-12-04T10:11:57.8092920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8092981Z     with policy():
2025-12-04T10:11:57.8093322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8093389Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8094187Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8094191Z 
2025-12-04T10:11:57.8094322Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8094835Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8094838Z 
2025-12-04T10:11:57.8094993Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8095124Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8095216Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8095767Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8095896Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8095959Z graph_break []
2025-12-04T10:11:57.8096284Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8096357Z Traceback (most recent call last):
2025-12-04T10:11:57.8096653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8096766Z     method(*args, **kwargs)
2025-12-04T10:11:57.8097061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8097128Z     method(*args, **kwargs)
2025-12-04T10:11:57.8097419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8097476Z     with policy():
2025-12-04T10:11:57.8097770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8097835Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8098644Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8098650Z 
2025-12-04T10:11:57.8098774Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8099318Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8099327Z 
2025-12-04T10:11:57.8099481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8099605Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8099700Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8100244Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8100373Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8100477Z graph_break []
2025-12-04T10:11:57.8100602Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8100693Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8100813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8101350Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8101412Z graph_break []
2025-12-04T10:11:57.8101495Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8101781Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8101851Z Traceback (most recent call last):
2025-12-04T10:11:57.8102148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8102215Z     method(*args, **kwargs)
2025-12-04T10:11:57.8102514Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8102576Z     method(*args, **kwargs)
2025-12-04T10:11:57.8102873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8102931Z     with policy():
2025-12-04T10:11:57.8103280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8103347Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8104147Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8104190Z 
2025-12-04T10:11:57.8104315Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8104825Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8104830Z 
2025-12-04T10:11:57.8104991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8105120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8105217Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8105761Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8105884Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8105945Z graph_break []
2025-12-04T10:11:57.8106100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8106188Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8106312Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8106847Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8106911Z graph_break []
2025-12-04T10:11:57.8107030Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8107116Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8107285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8107818Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8107878Z graph_break []
2025-12-04T10:11:57.8108366Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml -
2025-12-04T10:11:57.8108465Z =========================== short test summary info ============================
2025-12-04T10:11:57.8109742Z FAILED [0.5290s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8109750Z 
2025-12-04T10:11:57.8109873Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8110386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8110390Z 
2025-12-04T10:11:57.8110575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8110684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8110798Z ================== 1 failed, 39 deselected, 2 rerun in 3.00s ===================
2025-12-04T10:11:57.8110889Z Got exit code 1
2025-12-04T10:11:57.8110960Z Retrying single test...
2025-12-04T10:11:57.8111221Z W1204 09:57:19.956000 56378 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8111605Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml
2025-12-04T10:11:57.8111699Z ============================= test session starts ==============================
2025-12-04T10:11:57.8111904Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8111971Z cachedir: .pytest_cache
2025-12-04T10:11:57.8112278Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8112352Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8112419Z configfile: pytest.ini
2025-12-04T10:11:57.8112734Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8112869Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8113493Z stepcurrent: skipping 39 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8113568Z Running 1 items in this shard
2025-12-04T10:11:57.8113572Z 
2025-12-04T10:11:57.8114311Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:21.538428472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8114315Z 
2025-12-04T10:11:57.8114615Z [W1204 09:57:30.564998080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8114655Z 
2025-12-04T10:11:57.8114948Z [W1204 09:57:30.565264425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8114952Z 
2025-12-04T10:11:57.8115247Z [W1204 09:57:30.571916840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8115250Z 
2025-12-04T10:11:57.8115544Z [W1204 09:57:30.572533652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8115548Z 
2025-12-04T10:11:57.8115838Z [W1204 09:57:30.572705895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8115841Z 
2025-12-04T10:11:57.8116130Z [W1204 09:57:30.578163948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8116136Z 
2025-12-04T10:11:57.8116425Z [W1204 09:57:30.578759879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8116428Z 
2025-12-04T10:11:57.8116716Z [W1204 09:57:30.578929842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8116722Z 
2025-12-04T10:11:57.8116803Z ('RERUN', {'yellow': True}) [10.9693s] [100%]
2025-12-04T10:11:57.8117793Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:31.383582973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8117799Z 
2025-12-04T10:11:57.8118101Z [W1204 09:57:31.384130943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8118156Z 
2025-12-04T10:11:57.8118457Z [W1204 09:57:31.384271186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8118460Z 
2025-12-04T10:11:57.8118752Z [W1204 09:57:31.387168150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8118755Z 
2025-12-04T10:11:57.8119040Z [W1204 09:57:31.387617809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8119043Z 
2025-12-04T10:11:57.8119334Z [W1204 09:57:31.387755432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8119337Z 
2025-12-04T10:11:57.8119622Z [W1204 09:57:31.392373008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8119627Z 
2025-12-04T10:11:57.8119968Z [W1204 09:57:31.392830137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8119972Z 
2025-12-04T10:11:57.8120314Z [W1204 09:57:31.392967319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8120318Z 
2025-12-04T10:11:57.8120398Z ('RERUN', {'yellow': True}) [0.5013s] [100%]
2025-12-04T10:11:57.8121136Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:31.883574099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8121140Z 
2025-12-04T10:11:57.8121431Z [W1204 09:57:31.884114039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8121436Z 
2025-12-04T10:11:57.8121776Z [W1204 09:57:31.884255131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8121779Z 
2025-12-04T10:11:57.8122067Z [W1204 09:57:31.887165731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8122071Z 
2025-12-04T10:11:57.8122363Z [W1204 09:57:31.887611449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8122367Z 
2025-12-04T10:11:57.8122655Z [W1204 09:57:31.887747541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8122659Z 
2025-12-04T10:11:57.8122950Z [W1204 09:57:31.892332781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8122954Z 
2025-12-04T10:11:57.8123239Z [W1204 09:57:31.892791999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8123244Z 
2025-12-04T10:11:57.8123532Z [W1204 09:57:31.892928701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8123537Z 
2025-12-04T10:11:57.8123598Z FAILED [0.4986s] [100%]
2025-12-04T10:11:57.8123601Z 
2025-12-04T10:11:57.8123681Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8123970Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8124080Z Traceback (most recent call last):
2025-12-04T10:11:57.8124395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8124463Z     method(*args, **kwargs)
2025-12-04T10:11:57.8124790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8124857Z     method(*args, **kwargs)
2025-12-04T10:11:57.8125147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8125204Z     with policy():
2025-12-04T10:11:57.8125504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8125571Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8126371Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8126376Z 
2025-12-04T10:11:57.8126509Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8127062Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8127071Z 
2025-12-04T10:11:57.8127229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8127355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8127452Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8128007Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8128135Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8128198Z graph_break []
2025-12-04T10:11:57.8128396Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8129095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8129166Z   if out == self.unknown_value:
2025-12-04T10:11:57.8129451Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8129525Z Traceback (most recent call last):
2025-12-04T10:11:57.8129824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8129891Z     method(*args, **kwargs)
2025-12-04T10:11:57.8130180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8130244Z     method(*args, **kwargs)
2025-12-04T10:11:57.8130545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8130604Z     with policy():
2025-12-04T10:11:57.8130900Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8130969Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8131815Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8131819Z 
2025-12-04T10:11:57.8131950Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8132499Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8132505Z 
2025-12-04T10:11:57.8132665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8132788Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8132879Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8133425Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8133554Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8133615Z graph_break []
2025-12-04T10:11:57.8133735Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8134463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8134537Z   if out == self.unknown_value:
2025-12-04T10:11:57.8134662Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8134750Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8134877Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8135416Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8135481Z graph_break []
2025-12-04T10:11:57.8135561Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8135896Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8135977Z Traceback (most recent call last):
2025-12-04T10:11:57.8136277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8136342Z     method(*args, **kwargs)
2025-12-04T10:11:57.8136634Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8136699Z     method(*args, **kwargs)
2025-12-04T10:11:57.8136994Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8137053Z     with policy():
2025-12-04T10:11:57.8137349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8137419Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8138236Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8138240Z 
2025-12-04T10:11:57.8138372Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8138923Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8138927Z 
2025-12-04T10:11:57.8139087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8139209Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8139335Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8139880Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8140004Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8140067Z graph_break []
2025-12-04T10:11:57.8140187Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8140879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8140947Z   if out == self.unknown_value:
2025-12-04T10:11:57.8141069Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8141164Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8141285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8141873Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8141936Z graph_break []
2025-12-04T10:11:57.8142061Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8142156Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8142274Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8142812Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8142909Z graph_break []
2025-12-04T10:11:57.8143398Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml -
2025-12-04T10:11:57.8143499Z =========================== short test summary info ============================
2025-12-04T10:11:57.8144774Z FAILED [0.4986s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8144781Z 
2025-12-04T10:11:57.8144908Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8145425Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8145429Z 
2025-12-04T10:11:57.8145587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8145688Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8145803Z ================== 1 failed, 57 deselected, 2 rerun in 11.99s ==================
2025-12-04T10:11:57.8145863Z Got exit code 1
2025-12-04T10:11:57.8145962Z Retrying single test...
2025-12-04T10:11:57.8146222Z W1204 09:57:38.558000 56565 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8146606Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml
2025-12-04T10:11:57.8146735Z ============================= test session starts ==============================
2025-12-04T10:11:57.8146945Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8147012Z cachedir: .pytest_cache
2025-12-04T10:11:57.8147316Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8147395Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8147459Z configfile: pytest.ini
2025-12-04T10:11:57.8147773Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8147909Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8148475Z stepcurrent: skipping 39 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8148551Z Running 1 items in this shard
2025-12-04T10:11:57.8148554Z 
2025-12-04T10:11:57.8149315Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:40.155449721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8149320Z 
2025-12-04T10:11:57.8149624Z [W1204 09:57:49.240479018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8149628Z 
2025-12-04T10:11:57.8149922Z [W1204 09:57:49.240729032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8149925Z 
2025-12-04T10:11:57.8150217Z [W1204 09:57:49.246537042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8150254Z 
2025-12-04T10:11:57.8150547Z [W1204 09:57:49.247122893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8150550Z 
2025-12-04T10:11:57.8150837Z [W1204 09:57:49.247296356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8150840Z 
2025-12-04T10:11:57.8151130Z [W1204 09:57:49.252693299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8151135Z 
2025-12-04T10:11:57.8151424Z [W1204 09:57:49.253243059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8151427Z 
2025-12-04T10:11:57.8151716Z [W1204 09:57:49.253412951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8151722Z 
2025-12-04T10:11:57.8151803Z ('RERUN', {'yellow': True}) [11.0359s] [100%]
2025-12-04T10:11:57.8152524Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:50.052927236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8152527Z 
2025-12-04T10:11:57.8152850Z [W1204 09:57:50.053481766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8152854Z 
2025-12-04T10:11:57.8153144Z [W1204 09:57:50.053621988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8153147Z 
2025-12-04T10:11:57.8153436Z [W1204 09:57:50.056524768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8153473Z 
2025-12-04T10:11:57.8153761Z [W1204 09:57:50.056975636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8153768Z 
2025-12-04T10:11:57.8154053Z [W1204 09:57:50.057114908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8154056Z 
2025-12-04T10:11:57.8154341Z [W1204 09:57:50.061687357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8154346Z 
2025-12-04T10:11:57.8154635Z [W1204 09:57:50.062148005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8154640Z 
2025-12-04T10:11:57.8154927Z [W1204 09:57:50.062284488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8154933Z 
2025-12-04T10:11:57.8155015Z ('RERUN', {'yellow': True}) [0.4964s] [100%]
2025-12-04T10:11:57.8155790Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 09:57:50.548003983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8155795Z 
2025-12-04T10:11:57.8156087Z [W1204 09:57:50.548569503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8156092Z 
2025-12-04T10:11:57.8156381Z [W1204 09:57:50.548709605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8156385Z 
2025-12-04T10:11:57.8156679Z [W1204 09:57:50.551619595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8156719Z 
2025-12-04T10:11:57.8157012Z [W1204 09:57:50.552072053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8157015Z 
2025-12-04T10:11:57.8157301Z [W1204 09:57:50.552211806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8157304Z 
2025-12-04T10:11:57.8157590Z [W1204 09:57:50.556690953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8157593Z 
2025-12-04T10:11:57.8157879Z [W1204 09:57:50.557141651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8157882Z 
2025-12-04T10:11:57.8158173Z [W1204 09:57:50.557281523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8158179Z 
2025-12-04T10:11:57.8158240Z FAILED [0.4943s] [100%]
2025-12-04T10:11:57.8158243Z 
2025-12-04T10:11:57.8158330Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8158618Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8158689Z Traceback (most recent call last):
2025-12-04T10:11:57.8158999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8159062Z     method(*args, **kwargs)
2025-12-04T10:11:57.8159398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8159460Z     method(*args, **kwargs)
2025-12-04T10:11:57.8159751Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8159846Z     with policy():
2025-12-04T10:11:57.8160197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8160265Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8161066Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8161071Z 
2025-12-04T10:11:57.8161200Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8161718Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8161725Z 
2025-12-04T10:11:57.8161881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8162009Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8162140Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8162687Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8162815Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8162873Z graph_break []
2025-12-04T10:11:57.8162994Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8163685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8163793Z   if out == self.unknown_value:
2025-12-04T10:11:57.8164086Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8164158Z Traceback (most recent call last):
2025-12-04T10:11:57.8164455Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8164523Z     method(*args, **kwargs)
2025-12-04T10:11:57.8164815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8164879Z     method(*args, **kwargs)
2025-12-04T10:11:57.8165166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8165226Z     with policy():
2025-12-04T10:11:57.8165523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8165588Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8166399Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8166403Z 
2025-12-04T10:11:57.8166527Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8167073Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8167081Z 
2025-12-04T10:11:57.8167236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8167396Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8167489Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8168034Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8168157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8168217Z graph_break []
2025-12-04T10:11:57.8168339Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8169029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8169098Z   if out == self.unknown_value:
2025-12-04T10:11:57.8169218Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8169309Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8169464Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8170004Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8170061Z graph_break []
2025-12-04T10:11:57.8170143Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8170431Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8170505Z Traceback (most recent call last):
2025-12-04T10:11:57.8170800Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8170898Z     method(*args, **kwargs)
2025-12-04T10:11:57.8171192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8171258Z     method(*args, **kwargs)
2025-12-04T10:11:57.8171549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8171610Z     with policy():
2025-12-04T10:11:57.8171907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8171972Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8172777Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8172783Z 
2025-12-04T10:11:57.8172908Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8173418Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8173425Z 
2025-12-04T10:11:57.8173578Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8173734Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8173827Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8174366Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8174524Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8174584Z graph_break []
2025-12-04T10:11:57.8174706Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8175395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8175462Z   if out == self.unknown_value:
2025-12-04T10:11:57.8175582Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8175674Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8175792Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8176329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8176435Z graph_break []
2025-12-04T10:11:57.8176559Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8176649Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8176766Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8177306Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8177363Z graph_break []
2025-12-04T10:11:57.8177846Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml -
2025-12-04T10:11:57.8177985Z =========================== short test summary info ============================
2025-12-04T10:11:57.8179257Z FAILED [0.4943s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8179264Z 
2025-12-04T10:11:57.8179387Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8179900Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8179906Z 
2025-12-04T10:11:57.8180067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8180169Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8180283Z ================== 1 failed, 57 deselected, 2 rerun in 12.05s ==================
2025-12-04T10:11:57.8180342Z Got exit code 1
2025-12-04T10:11:57.8180809Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8181088Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8181350Z W1204 09:57:57.141000 56752 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8181732Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml
2025-12-04T10:11:57.8181954Z ============================= test session starts ==============================
2025-12-04T10:11:57.8182160Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8182225Z cachedir: .pytest_cache
2025-12-04T10:11:57.8182530Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8182614Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8182682Z configfile: pytest.ini
2025-12-04T10:11:57.8182997Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8183123Z collecting ... collected 58 items / 40 deselected / 18 selected
2025-12-04T10:11:57.8183213Z stepcurrent: skipping 40 already run items.
2025-12-04T10:11:57.8183282Z Running 18 items in this shard
2025-12-04T10:11:57.8183288Z 
2025-12-04T10:11:57.8183831Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0367s] [  5%]
2025-12-04T10:11:57.8184325Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6604s] [  5%]
2025-12-04T10:11:57.8184778Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6525s] [  5%]
2025-12-04T10:11:57.8184785Z 
2025-12-04T10:11:57.8184867Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8185164Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8185275Z Traceback (most recent call last):
2025-12-04T10:11:57.8185580Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8185644Z     method(*args, **kwargs)
2025-12-04T10:11:57.8185941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8186003Z     method(*args, **kwargs)
2025-12-04T10:11:57.8186295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8186352Z     with policy():
2025-12-04T10:11:57.8186646Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8186712Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8187527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8187534Z 
2025-12-04T10:11:57.8187660Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8188187Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8188191Z 
2025-12-04T10:11:57.8188381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8188511Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8188604Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8188956Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8189117Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8189176Z graph_break []
2025-12-04T10:11:57.8189472Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8189543Z Traceback (most recent call last):
2025-12-04T10:11:57.8189842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8189906Z     method(*args, **kwargs)
2025-12-04T10:11:57.8190195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8190258Z     method(*args, **kwargs)
2025-12-04T10:11:57.8190543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8190603Z     with policy():
2025-12-04T10:11:57.8190897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8190995Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8191828Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8191832Z 
2025-12-04T10:11:57.8191969Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8192496Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8192541Z 
2025-12-04T10:11:57.8192696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8192820Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8192914Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8193258Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8193382Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8193442Z graph_break []
2025-12-04T10:11:57.8193564Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8193651Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8193770Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8194106Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8194166Z graph_break []
2025-12-04T10:11:57.8194248Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8194540Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8194613Z Traceback (most recent call last):
2025-12-04T10:11:57.8194912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8194976Z     method(*args, **kwargs)
2025-12-04T10:11:57.8195315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8195379Z     method(*args, **kwargs)
2025-12-04T10:11:57.8195671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8195768Z     with policy():
2025-12-04T10:11:57.8196067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8196131Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8196952Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8196956Z 
2025-12-04T10:11:57.8197082Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8197604Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8197612Z 
2025-12-04T10:11:57.8197766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8197920Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8198009Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8198358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8198479Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8198538Z graph_break []
2025-12-04T10:11:57.8198659Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8198745Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8198865Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8199201Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8199293Z graph_break []
2025-12-04T10:11:57.8199421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8199507Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8199628Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8200002Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8200062Z graph_break []
2025-12-04T10:11:57.8200555Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml -
2025-12-04T10:11:57.8200653Z =========================== short test summary info ============================
2025-12-04T10:11:57.8201975Z FAILED [0.6525s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8201979Z 
2025-12-04T10:11:57.8202100Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8202676Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8202680Z 
2025-12-04T10:11:57.8202869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8202977Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8203095Z ================== 1 failed, 40 deselected, 2 rerun in 3.37s ===================
2025-12-04T10:11:57.8203154Z Got exit code 1
2025-12-04T10:11:57.8203220Z Retrying single test...
2025-12-04T10:11:57.8203483Z W1204 09:58:07.056000 56941 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8203869Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml
2025-12-04T10:11:57.8203967Z ============================= test session starts ==============================
2025-12-04T10:11:57.8204171Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8204234Z cachedir: .pytest_cache
2025-12-04T10:11:57.8204540Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8204614Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8204681Z configfile: pytest.ini
2025-12-04T10:11:57.8205026Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8205154Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8205732Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8205801Z Running 1 items in this shard
2025-12-04T10:11:57.8205805Z 
2025-12-04T10:11:57.8206550Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:08.276757045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8206589Z 
2025-12-04T10:11:57.8206890Z [W1204 09:58:17.285777770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8206894Z 
2025-12-04T10:11:57.8207189Z [W1204 09:58:17.286034515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8207193Z 
2025-12-04T10:11:57.8207487Z [W1204 09:58:17.291679423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8207490Z 
2025-12-04T10:11:57.8207785Z [W1204 09:58:17.292246553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8207791Z 
2025-12-04T10:11:57.8208079Z [W1204 09:58:17.292422866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8208084Z 
2025-12-04T10:11:57.8208373Z [W1204 09:58:17.297819410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8208376Z 
2025-12-04T10:11:57.8208668Z [W1204 09:58:17.298361499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8208672Z 
2025-12-04T10:11:57.8209015Z [W1204 09:58:17.298536092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8209019Z 
2025-12-04T10:11:57.8209102Z ('RERUN', {'yellow': True}) [11.0330s] [100%]
2025-12-04T10:11:57.8209838Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:18.636102449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8209876Z 
2025-12-04T10:11:57.8210170Z [W1204 09:58:18.636636128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8210173Z 
2025-12-04T10:11:57.8210459Z [W1204 09:58:18.636775141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8210463Z 
2025-12-04T10:11:57.8210755Z [W1204 09:58:18.639728201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8210758Z 
2025-12-04T10:11:57.8211045Z [W1204 09:58:18.640333072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8211048Z 
2025-12-04T10:11:57.8211333Z [W1204 09:58:18.640481334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8211342Z 
2025-12-04T10:11:57.8211672Z [W1204 09:58:18.645096515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8211677Z 
2025-12-04T10:11:57.8211966Z [W1204 09:58:18.645564813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8211970Z 
2025-12-04T10:11:57.8212260Z [W1204 09:58:18.645702935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8212265Z 
2025-12-04T10:11:57.8212343Z ('RERUN', {'yellow': True}) [0.5812s] [100%]
2025-12-04T10:11:57.8213076Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:19.214436229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8213115Z 
2025-12-04T10:11:57.8213402Z [W1204 09:58:19.214954518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8213406Z 
2025-12-04T10:11:57.8213697Z [W1204 09:58:19.215093271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8213700Z 
2025-12-04T10:11:57.8213985Z [W1204 09:58:19.217994290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8213990Z 
2025-12-04T10:11:57.8214274Z [W1204 09:58:19.218541190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8214281Z 
2025-12-04T10:11:57.8214566Z [W1204 09:58:19.218679562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8214573Z 
2025-12-04T10:11:57.8214859Z [W1204 09:58:19.223174950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8214862Z 
2025-12-04T10:11:57.8215148Z [W1204 09:58:19.223631518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8215151Z 
2025-12-04T10:11:57.8215440Z [W1204 09:58:19.223768081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8215444Z 
2025-12-04T10:11:57.8215540Z FAILED [0.5804s] [100%]
2025-12-04T10:11:57.8215544Z 
2025-12-04T10:11:57.8215626Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8215922Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8216042Z Traceback (most recent call last):
2025-12-04T10:11:57.8216350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8216416Z     method(*args, **kwargs)
2025-12-04T10:11:57.8216710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8216773Z     method(*args, **kwargs)
2025-12-04T10:11:57.8217228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8217290Z     with policy():
2025-12-04T10:11:57.8217586Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8217660Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8218535Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8218543Z 
2025-12-04T10:11:57.8218675Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8219199Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8219203Z 
2025-12-04T10:11:57.8219366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8219489Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8219582Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8219932Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8220105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8220169Z graph_break []
2025-12-04T10:11:57.8220293Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8220988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8221061Z   if out == self.unknown_value:
2025-12-04T10:11:57.8221353Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8221427Z Traceback (most recent call last):
2025-12-04T10:11:57.8221728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8221792Z     method(*args, **kwargs)
2025-12-04T10:11:57.8222092Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8222153Z     method(*args, **kwargs)
2025-12-04T10:11:57.8222442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8222503Z     with policy():
2025-12-04T10:11:57.8222798Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8222914Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8223743Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8223793Z 
2025-12-04T10:11:57.8223920Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8224449Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8224453Z 
2025-12-04T10:11:57.8224608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8224736Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8224827Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8225175Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8225302Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8225359Z graph_break []
2025-12-04T10:11:57.8225485Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8226240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8226311Z   if out == self.unknown_value:
2025-12-04T10:11:57.8226434Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8226524Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8226647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8226990Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8227084Z graph_break []
2025-12-04T10:11:57.8227174Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8227471Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8227544Z Traceback (most recent call last):
2025-12-04T10:11:57.8227854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8227917Z     method(*args, **kwargs)
2025-12-04T10:11:57.8228215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8228283Z     method(*args, **kwargs)
2025-12-04T10:11:57.8228571Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8228633Z     with policy():
2025-12-04T10:11:57.8228927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8228994Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8229821Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8229825Z 
2025-12-04T10:11:57.8229948Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8230517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8230556Z 
2025-12-04T10:11:57.8230712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8230838Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8230927Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8231271Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8231392Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8231449Z graph_break []
2025-12-04T10:11:57.8231573Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8232270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8232341Z   if out == self.unknown_value:
2025-12-04T10:11:57.8232468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8232556Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8232714Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8233053Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8233109Z graph_break []
2025-12-04T10:11:57.8233230Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8233317Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8233436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8233773Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8233885Z graph_break []
2025-12-04T10:11:57.8234372Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml -
2025-12-04T10:11:57.8234472Z =========================== short test summary info ============================
2025-12-04T10:11:57.8235788Z FAILED [0.5804s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8235792Z 
2025-12-04T10:11:57.8235915Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8236443Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8236453Z 
2025-12-04T10:11:57.8236607Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8236707Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8236824Z ================== 1 failed, 57 deselected, 2 rerun in 12.22s ==================
2025-12-04T10:11:57.8236882Z Got exit code 1
2025-12-04T10:11:57.8236987Z Retrying single test...
2025-12-04T10:11:57.8237259Z W1204 09:58:25.851000 57135 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8237641Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml
2025-12-04T10:11:57.8237779Z ============================= test session starts ==============================
2025-12-04T10:11:57.8237986Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8238049Z cachedir: .pytest_cache
2025-12-04T10:11:57.8238352Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8238425Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8238489Z configfile: pytest.ini
2025-12-04T10:11:57.8238805Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8238931Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8239507Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8239578Z Running 1 items in this shard
2025-12-04T10:11:57.8239582Z 
2025-12-04T10:11:57.8240393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:27.069214635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8240397Z 
2025-12-04T10:11:57.8240697Z [W1204 09:58:36.174502547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8240702Z 
2025-12-04T10:11:57.8240994Z [W1204 09:58:36.174756532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8241000Z 
2025-12-04T10:11:57.8241288Z [W1204 09:58:36.180653513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8241326Z 
2025-12-04T10:11:57.8241615Z [W1204 09:58:36.181254664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8241618Z 
2025-12-04T10:11:57.8241909Z [W1204 09:58:36.181434447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8241913Z 
2025-12-04T10:11:57.8242200Z [W1204 09:58:36.186929802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8242204Z 
2025-12-04T10:11:57.8242492Z [W1204 09:58:36.187485392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8242496Z 
2025-12-04T10:11:57.8242782Z [W1204 09:58:36.187658325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8242789Z 
2025-12-04T10:11:57.8242870Z ('RERUN', {'yellow': True}) [11.1213s] [100%]
2025-12-04T10:11:57.8243603Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:37.514024261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8243607Z 
2025-12-04T10:11:57.8243903Z [W1204 09:58:37.514553700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8243949Z 
2025-12-04T10:11:57.8244241Z [W1204 09:58:37.514696453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8244244Z 
2025-12-04T10:11:57.8244530Z [W1204 09:58:37.517561143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8244572Z 
2025-12-04T10:11:57.8244861Z [W1204 09:58:37.518111912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8244865Z 
2025-12-04T10:11:57.8245151Z [W1204 09:58:37.518256755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8245155Z 
2025-12-04T10:11:57.8245448Z [W1204 09:58:37.522693542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8245452Z 
2025-12-04T10:11:57.8245742Z [W1204 09:58:37.523149000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8245745Z 
2025-12-04T10:11:57.8246038Z [W1204 09:58:37.523284182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8246043Z 
2025-12-04T10:11:57.8246120Z ('RERUN', {'yellow': True}) [0.5752s] [100%]
2025-12-04T10:11:57.8246887Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 09:58:38.087010680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8246891Z 
2025-12-04T10:11:57.8247180Z [W1204 09:58:38.087539439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8247183Z 
2025-12-04T10:11:57.8247469Z [W1204 09:58:38.087683152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8247475Z 
2025-12-04T10:11:57.8247764Z [W1204 09:58:38.090525091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8247801Z 
2025-12-04T10:11:57.8248089Z [W1204 09:58:38.091076211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8248093Z 
2025-12-04T10:11:57.8248381Z [W1204 09:58:38.091215173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8248384Z 
2025-12-04T10:11:57.8248668Z [W1204 09:58:38.095615479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8248671Z 
2025-12-04T10:11:57.8248962Z [W1204 09:58:38.096062097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8248966Z 
2025-12-04T10:11:57.8249252Z [W1204 09:58:38.096198400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8249258Z 
2025-12-04T10:11:57.8249324Z FAILED [0.5677s] [100%]
2025-12-04T10:11:57.8249328Z 
2025-12-04T10:11:57.8249410Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8249715Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8249795Z Traceback (most recent call last):
2025-12-04T10:11:57.8250100Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8250167Z     method(*args, **kwargs)
2025-12-04T10:11:57.8250499Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8250563Z     method(*args, **kwargs)
2025-12-04T10:11:57.8250852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8250947Z     with policy():
2025-12-04T10:11:57.8251242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8251313Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8252122Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8252126Z 
2025-12-04T10:11:57.8252257Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8252782Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8252788Z 
2025-12-04T10:11:57.8252948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8253071Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8253200Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8253550Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8253673Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8253734Z graph_break []
2025-12-04T10:11:57.8253860Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8254553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8254660Z   if out == self.unknown_value:
2025-12-04T10:11:57.8254952Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8255026Z Traceback (most recent call last):
2025-12-04T10:11:57.8255322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8255384Z     method(*args, **kwargs)
2025-12-04T10:11:57.8255678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8255741Z     method(*args, **kwargs)
2025-12-04T10:11:57.8256032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8256093Z     with policy():
2025-12-04T10:11:57.8256385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8256453Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8257285Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8257290Z 
2025-12-04T10:11:57.8257414Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8258224Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8258229Z 
2025-12-04T10:11:57.8258388Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8258552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8258645Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8259004Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8259131Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8259188Z graph_break []
2025-12-04T10:11:57.8259312Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8260006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8260073Z   if out == self.unknown_value:
2025-12-04T10:11:57.8260197Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8260289Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8260410Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8260804Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8260864Z graph_break []
2025-12-04T10:11:57.8260949Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8261240Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8261314Z Traceback (most recent call last):
2025-12-04T10:11:57.8261614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8261676Z     method(*args, **kwargs)
2025-12-04T10:11:57.8261973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8262072Z     method(*args, **kwargs)
2025-12-04T10:11:57.8262361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8262421Z     with policy():
2025-12-04T10:11:57.8262714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8262780Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8263611Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8263617Z 
2025-12-04T10:11:57.8263737Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8264265Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8264269Z 
2025-12-04T10:11:57.8264424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8264550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8264637Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8265011Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8265135Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8265202Z graph_break []
2025-12-04T10:11:57.8265361Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8266048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8266114Z   if out == self.unknown_value:
2025-12-04T10:11:57.8266236Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8266324Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8266443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8266788Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8266845Z graph_break []
2025-12-04T10:11:57.8266966Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8267054Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8267174Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8267546Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8267603Z graph_break []
2025-12-04T10:11:57.8268089Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml -
2025-12-04T10:11:57.8268190Z =========================== short test summary info ============================
2025-12-04T10:11:57.8269503Z FAILED [0.5677s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8269546Z 
2025-12-04T10:11:57.8269668Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8270191Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8270194Z 
2025-12-04T10:11:57.8270351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8270455Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8270573Z ================== 1 failed, 57 deselected, 2 rerun in 12.29s ==================
2025-12-04T10:11:57.8270634Z Got exit code 1
2025-12-04T10:11:57.8271116Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8271372Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8271633Z W1204 09:58:44.714000 57329 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8272019Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml
2025-12-04T10:11:57.8272150Z ============================= test session starts ==============================
2025-12-04T10:11:57.8272357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8272426Z cachedir: .pytest_cache
2025-12-04T10:11:57.8272762Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8272838Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8272907Z configfile: pytest.ini
2025-12-04T10:11:57.8273222Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8273349Z collecting ... collected 58 items / 41 deselected / 17 selected
2025-12-04T10:11:57.8273433Z stepcurrent: skipping 41 already run items.
2025-12-04T10:11:57.8273500Z Running 17 items in this shard
2025-12-04T10:11:57.8273504Z 
2025-12-04T10:11:57.8274010Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8657s] [  5%]
2025-12-04T10:11:57.8274499Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4558s] [  5%]
2025-12-04T10:11:57.8274979Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4519s] [  5%]
2025-12-04T10:11:57.8274983Z 
2025-12-04T10:11:57.8275064Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8275362Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8275437Z Traceback (most recent call last):
2025-12-04T10:11:57.8275744Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8275810Z     method(*args, **kwargs)
2025-12-04T10:11:57.8276102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8276201Z     method(*args, **kwargs)
2025-12-04T10:11:57.8276497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8276558Z     with policy():
2025-12-04T10:11:57.8276852Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8276916Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8277718Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8277722Z 
2025-12-04T10:11:57.8277860Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8278385Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8278390Z 
2025-12-04T10:11:57.8278547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8278673Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8278764Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8279148Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8279274Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8279333Z graph_break []
2025-12-04T10:11:57.8279623Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8279729Z Traceback (most recent call last):
2025-12-04T10:11:57.8280112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8280178Z     method(*args, **kwargs)
2025-12-04T10:11:57.8280470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8280534Z     method(*args, **kwargs)
2025-12-04T10:11:57.8280821Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8280884Z     with policy():
2025-12-04T10:11:57.8281178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8281243Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8282058Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8282104Z 
2025-12-04T10:11:57.8282228Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8282750Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8282753Z 
2025-12-04T10:11:57.8282909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8283036Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8283126Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8283469Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8283631Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8283688Z graph_break []
2025-12-04T10:11:57.8283812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8283903Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8284022Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8284365Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8284426Z graph_break []
2025-12-04T10:11:57.8284506Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8284797Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8284870Z Traceback (most recent call last):
2025-12-04T10:11:57.8285167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8285235Z     method(*args, **kwargs)
2025-12-04T10:11:57.8285532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8285596Z     method(*args, **kwargs)
2025-12-04T10:11:57.8285886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8285944Z     with policy():
2025-12-04T10:11:57.8286277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8286343Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8287161Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8287219Z 
2025-12-04T10:11:57.8287343Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8287863Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8287870Z 
2025-12-04T10:11:57.8288025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8288150Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8288251Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8288605Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8288731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8288793Z graph_break []
2025-12-04T10:11:57.8288950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8289043Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8289163Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8289508Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8289570Z graph_break []
2025-12-04T10:11:57.8289691Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8289776Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8289899Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8290270Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8290333Z graph_break []
2025-12-04T10:11:57.8290823Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml -
2025-12-04T10:11:57.8290921Z =========================== short test summary info ============================
2025-12-04T10:11:57.8292229Z FAILED [0.4519s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8292237Z 
2025-12-04T10:11:57.8292358Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8292879Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8292883Z 
2025-12-04T10:11:57.8293042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8293261Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8293378Z ================== 1 failed, 41 deselected, 2 rerun in 2.80s ===================
2025-12-04T10:11:57.8293436Z Got exit code 1
2025-12-04T10:11:57.8293504Z Retrying single test...
2025-12-04T10:11:57.8293764Z W1204 09:58:54.422000 57517 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8294188Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml
2025-12-04T10:11:57.8294285Z ============================= test session starts ==============================
2025-12-04T10:11:57.8294492Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8294559Z cachedir: .pytest_cache
2025-12-04T10:11:57.8294861Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8294937Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8295003Z configfile: pytest.ini
2025-12-04T10:11:57.8295318Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8295449Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8296063Z stepcurrent: skipping 41 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8296133Z Running 1 items in this shard
2025-12-04T10:11:57.8296137Z 
2025-12-04T10:11:57.8296869Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:58:55.488246137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8296873Z 
2025-12-04T10:11:57.8297171Z [W1204 09:59:04.552358582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8297175Z 
2025-12-04T10:11:57.8297469Z [W1204 09:59:04.552594176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8297507Z 
2025-12-04T10:11:57.8297805Z [W1204 09:59:04.558099260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8297809Z 
2025-12-04T10:11:57.8298097Z [W1204 09:59:04.558620979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8298100Z 
2025-12-04T10:11:57.8298390Z [W1204 09:59:04.558791692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8298394Z 
2025-12-04T10:11:57.8298706Z [W1204 09:59:04.564106963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8298709Z 
2025-12-04T10:11:57.8299006Z [W1204 09:59:04.564653142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8299012Z 
2025-12-04T10:11:57.8299303Z [W1204 09:59:04.564825245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8299308Z 
2025-12-04T10:11:57.8299391Z ('RERUN', {'yellow': True}) [10.9340s] [100%]
2025-12-04T10:11:57.8300125Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:05.738239605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8300162Z 
2025-12-04T10:11:57.8300458Z [W1204 09:59:05.738778935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8300462Z 
2025-12-04T10:11:57.8300747Z [W1204 09:59:05.738925917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8300786Z 
2025-12-04T10:11:57.8301077Z [W1204 09:59:05.741844537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8301080Z 
2025-12-04T10:11:57.8301367Z [W1204 09:59:05.742421617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8301370Z 
2025-12-04T10:11:57.8301658Z [W1204 09:59:05.742563649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8301662Z 
2025-12-04T10:11:57.8301949Z [W1204 09:59:05.746981535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8301953Z 
2025-12-04T10:11:57.8302241Z [W1204 09:59:05.747440542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8302248Z 
2025-12-04T10:11:57.8302535Z [W1204 09:59:05.747581835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8302571Z 
2025-12-04T10:11:57.8302651Z ('RERUN', {'yellow': True}) [0.4176s] [100%]
2025-12-04T10:11:57.8303388Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:06.154333725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8303392Z 
2025-12-04T10:11:57.8303680Z [W1204 09:59:06.154869114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8303683Z 
2025-12-04T10:11:57.8303971Z [W1204 09:59:06.155015116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8304009Z 
2025-12-04T10:11:57.8304298Z [W1204 09:59:06.157886805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8304301Z 
2025-12-04T10:11:57.8304589Z [W1204 09:59:06.158442895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8304592Z 
2025-12-04T10:11:57.8304879Z [W1204 09:59:06.158581137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8304883Z 
2025-12-04T10:11:57.8305173Z [W1204 09:59:06.163033363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8305176Z 
2025-12-04T10:11:57.8305460Z [W1204 09:59:06.163497971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8305466Z 
2025-12-04T10:11:57.8305755Z [W1204 09:59:06.163637784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8305760Z 
2025-12-04T10:11:57.8305820Z FAILED [0.4131s] [100%]
2025-12-04T10:11:57.8305824Z 
2025-12-04T10:11:57.8305905Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8306202Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8306275Z Traceback (most recent call last):
2025-12-04T10:11:57.8306628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8306700Z     method(*args, **kwargs)
2025-12-04T10:11:57.8306998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8307097Z     method(*args, **kwargs)
2025-12-04T10:11:57.8307388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8307449Z     with policy():
2025-12-04T10:11:57.8307755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8307819Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8308641Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8308645Z 
2025-12-04T10:11:57.8308781Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8309306Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8309314Z 
2025-12-04T10:11:57.8309512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8309644Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8309744Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8310098Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8310233Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8310296Z graph_break []
2025-12-04T10:11:57.8310420Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8311118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8311225Z   if out == self.unknown_value:
2025-12-04T10:11:57.8311524Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8311600Z Traceback (most recent call last):
2025-12-04T10:11:57.8311898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8311967Z     method(*args, **kwargs)
2025-12-04T10:11:57.8312259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8312322Z     method(*args, **kwargs)
2025-12-04T10:11:57.8312616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8312678Z     with policy():
2025-12-04T10:11:57.8312975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8313044Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8313861Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8313865Z 
2025-12-04T10:11:57.8314047Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8314566Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8314605Z 
2025-12-04T10:11:57.8314768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8314891Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8314985Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8315338Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8315465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8315523Z graph_break []
2025-12-04T10:11:57.8315663Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8316357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8316432Z   if out == self.unknown_value:
2025-12-04T10:11:57.8316552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8316680Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8316807Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8317311Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8317374Z graph_break []
2025-12-04T10:11:57.8317460Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8317755Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8317832Z Traceback (most recent call last):
2025-12-04T10:11:57.8318132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8318262Z     method(*args, **kwargs)
2025-12-04T10:11:57.8318562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8318626Z     method(*args, **kwargs)
2025-12-04T10:11:57.8318921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8318983Z     with policy():
2025-12-04T10:11:57.8319293Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8319363Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8320223Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8320231Z 
2025-12-04T10:11:57.8320359Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8320878Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8320882Z 
2025-12-04T10:11:57.8321046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8321223Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8321316Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8321659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8321831Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8321889Z graph_break []
2025-12-04T10:11:57.8322014Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8322704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8322777Z   if out == self.unknown_value:
2025-12-04T10:11:57.8322899Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8322990Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8323116Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8323458Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8323521Z graph_break []
2025-12-04T10:11:57.8323641Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8323775Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8323900Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8324239Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8324295Z graph_break []
2025-12-04T10:11:57.8324798Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml -
2025-12-04T10:11:57.8324899Z =========================== short test summary info ============================
2025-12-04T10:11:57.8326199Z FAILED [0.4131s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8326241Z 
2025-12-04T10:11:57.8326367Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8326895Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8326898Z 
2025-12-04T10:11:57.8327056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8327160Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8327281Z ================== 1 failed, 57 deselected, 2 rerun in 11.79s ==================
2025-12-04T10:11:57.8327338Z Got exit code 1
2025-12-04T10:11:57.8327400Z Retrying single test...
2025-12-04T10:11:57.8327661Z W1204 09:59:12.814000 57710 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8328047Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml
2025-12-04T10:11:57.8328148Z ============================= test session starts ==============================
2025-12-04T10:11:57.8328391Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8328467Z cachedir: .pytest_cache
2025-12-04T10:11:57.8328773Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8328886Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8328953Z configfile: pytest.ini
2025-12-04T10:11:57.8329270Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8329400Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8329982Z stepcurrent: skipping 41 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8330054Z Running 1 items in this shard
2025-12-04T10:11:57.8330057Z 
2025-12-04T10:11:57.8330796Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:13.879775141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8330803Z 
2025-12-04T10:11:57.8331101Z [W1204 09:59:23.030145615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8331104Z 
2025-12-04T10:11:57.8331435Z [W1204 09:59:23.030408309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8331439Z 
2025-12-04T10:11:57.8331727Z [W1204 09:59:23.036151716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8331731Z 
2025-12-04T10:11:57.8332018Z [W1204 09:59:23.036753345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8332021Z 
2025-12-04T10:11:57.8332306Z [W1204 09:59:23.036927117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8332343Z 
2025-12-04T10:11:57.8332630Z [W1204 09:59:23.042346509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8332635Z 
2025-12-04T10:11:57.8332926Z [W1204 09:59:23.042915727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8332929Z 
2025-12-04T10:11:57.8333213Z [W1204 09:59:23.043088610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8333216Z 
2025-12-04T10:11:57.8333301Z ('RERUN', {'yellow': True}) [11.0206s] [100%]
2025-12-04T10:11:57.8334032Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:24.216615905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8334038Z 
2025-12-04T10:11:57.8334330Z [W1204 09:59:24.217140242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8334333Z 
2025-12-04T10:11:57.8334620Z [W1204 09:59:24.217280644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8334624Z 
2025-12-04T10:11:57.8334914Z [W1204 09:59:24.220239479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8334917Z 
2025-12-04T10:11:57.8335237Z [W1204 09:59:24.220807948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8335240Z 
2025-12-04T10:11:57.8335529Z [W1204 09:59:24.220948740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8335567Z 
2025-12-04T10:11:57.8335857Z [W1204 09:59:24.225550099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8335860Z 
2025-12-04T10:11:57.8336148Z [W1204 09:59:24.226010016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8336151Z 
2025-12-04T10:11:57.8336441Z [W1204 09:59:24.226148768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8336444Z 
2025-12-04T10:11:57.8336523Z ('RERUN', {'yellow': True}) [0.4125s] [100%]
2025-12-04T10:11:57.8337255Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 09:59:24.626037983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8337260Z 
2025-12-04T10:11:57.8337549Z [W1204 09:59:24.626556760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8337552Z 
2025-12-04T10:11:57.8337891Z [W1204 09:59:24.626696423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8337894Z 
2025-12-04T10:11:57.8338184Z [W1204 09:59:24.629556406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8338188Z 
2025-12-04T10:11:57.8338486Z [W1204 09:59:24.630121644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8338489Z 
2025-12-04T10:11:57.8338777Z [W1204 09:59:24.630265476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8338782Z 
2025-12-04T10:11:57.8339133Z [W1204 09:59:24.634715413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8339139Z 
2025-12-04T10:11:57.8339428Z [W1204 09:59:24.635164390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8339432Z 
2025-12-04T10:11:57.8339719Z [W1204 09:59:24.635298712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8339722Z 
2025-12-04T10:11:57.8339785Z FAILED [0.4034s] [100%]
2025-12-04T10:11:57.8339788Z 
2025-12-04T10:11:57.8339870Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8340172Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8340245Z Traceback (most recent call last):
2025-12-04T10:11:57.8340551Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8340626Z     method(*args, **kwargs)
2025-12-04T10:11:57.8340922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8340990Z     method(*args, **kwargs)
2025-12-04T10:11:57.8341281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8341338Z     with policy():
2025-12-04T10:11:57.8341678Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8341748Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8342550Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8342593Z 
2025-12-04T10:11:57.8342723Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8343243Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8343247Z 
2025-12-04T10:11:57.8343410Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8343537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8343634Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8343982Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8344113Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8344174Z graph_break []
2025-12-04T10:11:57.8344298Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8345024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8345099Z   if out == self.unknown_value:
2025-12-04T10:11:57.8345392Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8345470Z Traceback (most recent call last):
2025-12-04T10:11:57.8345770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8345833Z     method(*args, **kwargs)
2025-12-04T10:11:57.8346168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8346229Z     method(*args, **kwargs)
2025-12-04T10:11:57.8346524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8346583Z     with policy():
2025-12-04T10:11:57.8346877Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8346945Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8347766Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8347771Z 
2025-12-04T10:11:57.8347900Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8348429Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8348432Z 
2025-12-04T10:11:57.8348587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8348714Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8348811Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8349200Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8349327Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8349385Z graph_break []
2025-12-04T10:11:57.8349544Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8350240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8350315Z   if out == self.unknown_value:
2025-12-04T10:11:57.8350437Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8350527Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8350653Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8350997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8351054Z graph_break []
2025-12-04T10:11:57.8351140Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8351430Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8351541Z Traceback (most recent call last):
2025-12-04T10:11:57.8351844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8351907Z     method(*args, **kwargs)
2025-12-04T10:11:57.8352201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8352263Z     method(*args, **kwargs)
2025-12-04T10:11:57.8352552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8352612Z     with policy():
2025-12-04T10:11:57.8352906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8353008Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8353824Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8353828Z 
2025-12-04T10:11:57.8353955Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8354477Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8354481Z 
2025-12-04T10:11:57.8354637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8354767Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8354860Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8355207Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8355328Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8355384Z graph_break []
2025-12-04T10:11:57.8355507Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8356229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8356301Z   if out == self.unknown_value:
2025-12-04T10:11:57.8356422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8356548Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8356673Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8357013Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8357070Z graph_break []
2025-12-04T10:11:57.8357195Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8357281Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8357400Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8357739Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8357796Z graph_break []
2025-12-04T10:11:57.8358294Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml -
2025-12-04T10:11:57.8358395Z =========================== short test summary info ============================
2025-12-04T10:11:57.8360029Z FAILED [0.4034s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8360038Z 
2025-12-04T10:11:57.8360192Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8360728Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8360770Z 
2025-12-04T10:11:57.8360934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8361045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8361164Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ==================
2025-12-04T10:11:57.8361227Z Got exit code 1
2025-12-04T10:11:57.8361707Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8361955Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8362221Z W1204 09:59:31.282000 57903 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8362610Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml
2025-12-04T10:11:57.8362707Z ============================= test session starts ==============================
2025-12-04T10:11:57.8362915Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8362988Z cachedir: .pytest_cache
2025-12-04T10:11:57.8363296Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8363375Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8363474Z configfile: pytest.ini
2025-12-04T10:11:57.8363793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8363929Z collecting ... collected 58 items / 42 deselected / 16 selected
2025-12-04T10:11:57.8364051Z stepcurrent: skipping 42 already run items.
2025-12-04T10:11:57.8364123Z Running 16 items in this shard
2025-12-04T10:11:57.8364130Z 
2025-12-04T10:11:57.8364633Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9473s] [  6%]
2025-12-04T10:11:57.8365121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5434s] [  6%]
2025-12-04T10:11:57.8365566Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5327s] [  6%]
2025-12-04T10:11:57.8365570Z 
2025-12-04T10:11:57.8365653Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8365949Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8366024Z Traceback (most recent call last):
2025-12-04T10:11:57.8366393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8366466Z     method(*args, **kwargs)
2025-12-04T10:11:57.8366761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8366825Z     method(*args, **kwargs)
2025-12-04T10:11:57.8367119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8367181Z     with policy():
2025-12-04T10:11:57.8367477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8367540Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8368343Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8368387Z 
2025-12-04T10:11:57.8368514Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8369036Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8369040Z 
2025-12-04T10:11:57.8369202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8369330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8369427Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8369979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8370107Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8370168Z graph_break []
2025-12-04T10:11:57.8370454Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8370531Z Traceback (most recent call last):
2025-12-04T10:11:57.8370866Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8370931Z     method(*args, **kwargs)
2025-12-04T10:11:57.8371225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8371320Z     method(*args, **kwargs)
2025-12-04T10:11:57.8371613Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8371675Z     with policy():
2025-12-04T10:11:57.8371971Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8372039Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8372848Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8372853Z 
2025-12-04T10:11:57.8372976Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8373499Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8373506Z 
2025-12-04T10:11:57.8373698Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8373828Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8373919Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8374461Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8374591Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8374649Z graph_break []
2025-12-04T10:11:57.8374774Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8374863Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8375016Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8375554Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8375611Z graph_break []
2025-12-04T10:11:57.8375708Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8376005Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8376079Z Traceback (most recent call last):
2025-12-04T10:11:57.8376380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8376442Z     method(*args, **kwargs)
2025-12-04T10:11:57.8376734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8376798Z     method(*args, **kwargs)
2025-12-04T10:11:57.8377087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8377147Z     with policy():
2025-12-04T10:11:57.8377440Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8377505Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8378364Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8378400Z 
2025-12-04T10:11:57.8378527Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8379051Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8379054Z 
2025-12-04T10:11:57.8379208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8379334Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8379423Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8379963Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8380091Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8380150Z graph_break []
2025-12-04T10:11:57.8380271Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8380360Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8380513Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8381050Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8381110Z graph_break []
2025-12-04T10:11:57.8381233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8381325Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8381444Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8381978Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8382073Z graph_break []
2025-12-04T10:11:57.8382574Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml -
2025-12-04T10:11:57.8382679Z =========================== short test summary info ============================
2025-12-04T10:11:57.8383974Z FAILED [0.5327s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8383982Z 
2025-12-04T10:11:57.8384106Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8384627Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8384631Z 
2025-12-04T10:11:57.8384788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8384891Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8385043Z ================== 1 failed, 42 deselected, 2 rerun in 3.05s ===================
2025-12-04T10:11:57.8385105Z Got exit code 1
2025-12-04T10:11:57.8385169Z Retrying single test...
2025-12-04T10:11:57.8385428Z W1204 09:59:40.950000 58092 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8385850Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml
2025-12-04T10:11:57.8385944Z ============================= test session starts ==============================
2025-12-04T10:11:57.8386157Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8386222Z cachedir: .pytest_cache
2025-12-04T10:11:57.8386529Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8386607Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8386675Z configfile: pytest.ini
2025-12-04T10:11:57.8386994Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8387121Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8387696Z stepcurrent: skipping 42 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8387808Z Running 1 items in this shard
2025-12-04T10:11:57.8387812Z 
2025-12-04T10:11:57.8388542Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:59:42.539445427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8388546Z 
2025-12-04T10:11:57.8388858Z [W1204 09:59:51.765851091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8388862Z 
2025-12-04T10:11:57.8389160Z [W1204 09:59:51.766099266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8389199Z 
2025-12-04T10:11:57.8389496Z [W1204 09:59:51.772779021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8389500Z 
2025-12-04T10:11:57.8389787Z [W1204 09:59:51.773391321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8389790Z 
2025-12-04T10:11:57.8390078Z [W1204 09:59:51.773575885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8390082Z 
2025-12-04T10:11:57.8390368Z [W1204 09:59:51.778993598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8390372Z 
2025-12-04T10:11:57.8390658Z [W1204 09:59:51.779515607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8390663Z 
2025-12-04T10:11:57.8390957Z [W1204 09:59:51.779688380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8390960Z 
2025-12-04T10:11:57.8391042Z ('RERUN', {'yellow': True}) [11.1748s] [100%]
2025-12-04T10:11:57.8391766Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:59:52.586513809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8391771Z 
2025-12-04T10:11:57.8392094Z [W1204 09:59:52.587029988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8392098Z 
2025-12-04T10:11:57.8392388Z [W1204 09:59:52.587169200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8392442Z 
2025-12-04T10:11:57.8392732Z [W1204 09:59:52.590145731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8392735Z 
2025-12-04T10:11:57.8393029Z [W1204 09:59:52.590610169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8393032Z 
2025-12-04T10:11:57.8393319Z [W1204 09:59:52.590748591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8393322Z 
2025-12-04T10:11:57.8393608Z [W1204 09:59:52.595332370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8393615Z 
2025-12-04T10:11:57.8393901Z [W1204 09:59:52.595791688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8393905Z 
2025-12-04T10:11:57.8394200Z [W1204 09:59:52.595929581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8394205Z 
2025-12-04T10:11:57.8394325Z ('RERUN', {'yellow': True}) [0.5024s] [100%]
2025-12-04T10:11:57.8395050Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 09:59:53.087294841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8395055Z 
2025-12-04T10:11:57.8395349Z [W1204 09:59:53.087812480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8395352Z 
2025-12-04T10:11:57.8395639Z [W1204 09:59:53.087952533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8395676Z 
2025-12-04T10:11:57.8395966Z [W1204 09:59:53.090904453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8395969Z 
2025-12-04T10:11:57.8396256Z [W1204 09:59:53.091365691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8396260Z 
2025-12-04T10:11:57.8396547Z [W1204 09:59:53.091502304 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8396550Z 
2025-12-04T10:11:57.8396841Z [W1204 09:59:53.096014761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8396844Z 
2025-12-04T10:11:57.8397130Z [W1204 09:59:53.096485460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8397138Z 
2025-12-04T10:11:57.8397427Z [W1204 09:59:53.096622032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8397430Z 
2025-12-04T10:11:57.8397490Z FAILED [0.5024s] [100%]
2025-12-04T10:11:57.8397495Z 
2025-12-04T10:11:57.8397578Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8397873Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8397946Z Traceback (most recent call last):
2025-12-04T10:11:57.8398287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8398352Z     method(*args, **kwargs)
2025-12-04T10:11:57.8398652Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8398749Z     method(*args, **kwargs)
2025-12-04T10:11:57.8399039Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8399098Z     with policy():
2025-12-04T10:11:57.8399395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8399464Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8400297Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8400302Z 
2025-12-04T10:11:57.8400429Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8400950Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8400957Z 
2025-12-04T10:11:57.8401114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8401285Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8401380Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8401924Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8402055Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8402113Z graph_break []
2025-12-04T10:11:57.8402241Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8402933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8403113Z   if out == self.unknown_value:
2025-12-04T10:11:57.8403412Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8403486Z Traceback (most recent call last):
2025-12-04T10:11:57.8403788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8403858Z     method(*args, **kwargs)
2025-12-04T10:11:57.8404152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8404216Z     method(*args, **kwargs)
2025-12-04T10:11:57.8404504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8404569Z     with policy():
2025-12-04T10:11:57.8404864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8404931Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8405740Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8405745Z 
2025-12-04T10:11:57.8405906Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8406428Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8406466Z 
2025-12-04T10:11:57.8406624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8406758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8406859Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8407399Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8407527Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8407590Z graph_break []
2025-12-04T10:11:57.8407712Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8408413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8408484Z   if out == self.unknown_value:
2025-12-04T10:11:57.8408645Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8408736Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8408860Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8409409Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8409468Z graph_break []
2025-12-04T10:11:57.8409550Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8409843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8409950Z Traceback (most recent call last):
2025-12-04T10:11:57.8410252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8410371Z     method(*args, **kwargs)
2025-12-04T10:11:57.8410699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8414874Z     method(*args, **kwargs)
2025-12-04T10:11:57.8415254Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8415324Z     with policy():
2025-12-04T10:11:57.8415650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8415722Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8416554Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8416568Z 
2025-12-04T10:11:57.8416703Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8417453Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8417458Z 
2025-12-04T10:11:57.8417740Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8417890Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8417993Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8418540Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8418734Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8418799Z graph_break []
2025-12-04T10:11:57.8418927Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8419633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8419707Z   if out == self.unknown_value:
2025-12-04T10:11:57.8419832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8419933Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8420063Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8420662Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8420726Z graph_break []
2025-12-04T10:11:57.8420848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8420940Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8421073Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8421610Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8421673Z graph_break []
2025-12-04T10:11:57.8422179Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml -
2025-12-04T10:11:57.8422332Z =========================== short test summary info ============================
2025-12-04T10:11:57.8423639Z FAILED [0.5024s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8423643Z 
2025-12-04T10:11:57.8423783Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8424309Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8424316Z 
2025-12-04T10:11:57.8424482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8424591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8424707Z ================== 1 failed, 57 deselected, 2 rerun in 12.20s ==================
2025-12-04T10:11:57.8424769Z Got exit code 1
2025-12-04T10:11:57.8424834Z Retrying single test...
2025-12-04T10:11:57.8425140Z W1204 09:59:59.747000 58286 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8425536Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml
2025-12-04T10:11:57.8425631Z ============================= test session starts ==============================
2025-12-04T10:11:57.8425882Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8425948Z cachedir: .pytest_cache
2025-12-04T10:11:57.8426257Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8426339Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8426404Z configfile: pytest.ini
2025-12-04T10:11:57.8426722Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8426860Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8427437Z stepcurrent: skipping 42 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8427513Z Running 1 items in this shard
2025-12-04T10:11:57.8427517Z 
2025-12-04T10:11:57.8428287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:00:01.344704453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8428292Z 
2025-12-04T10:11:57.8428592Z [W1204 10:00:10.280196486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8428596Z 
2025-12-04T10:11:57.8428887Z [W1204 10:00:10.280469020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8428891Z 
2025-12-04T10:11:57.8429177Z [W1204 10:00:10.286489393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8429220Z 
2025-12-04T10:11:57.8429508Z [W1204 10:00:10.287096063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8429511Z 
2025-12-04T10:11:57.8429802Z [W1204 10:00:10.287271195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8429806Z 
2025-12-04T10:11:57.8430094Z [W1204 10:00:10.292758731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8430098Z 
2025-12-04T10:11:57.8430384Z [W1204 10:00:10.293290469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8430387Z 
2025-12-04T10:11:57.8430675Z [W1204 10:00:10.293449542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8430680Z 
2025-12-04T10:11:57.8430762Z ('RERUN', {'yellow': True}) [10.8914s] [100%]
2025-12-04T10:11:57.8431487Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:00:11.096759827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8431491Z 
2025-12-04T10:11:57.8431796Z [W1204 10:00:11.097268604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8431800Z 
2025-12-04T10:11:57.8432131Z [W1204 10:00:11.097411647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8432135Z 
2025-12-04T10:11:57.8432423Z [W1204 10:00:11.100366272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8432459Z 
2025-12-04T10:11:57.8432749Z [W1204 10:00:11.100820640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8432752Z 
2025-12-04T10:11:57.8433040Z [W1204 10:00:11.100962242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8433043Z 
2025-12-04T10:11:57.8433327Z [W1204 10:00:11.105480752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8433330Z 
2025-12-04T10:11:57.8433618Z [W1204 10:00:11.105939180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8433621Z 
2025-12-04T10:11:57.8433909Z [W1204 10:00:11.106079422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8433912Z 
2025-12-04T10:11:57.8433994Z ('RERUN', {'yellow': True}) [0.4999s] [100%]
2025-12-04T10:11:57.8434753Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:00:11.594781890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8434757Z 
2025-12-04T10:11:57.8435048Z [W1204 10:00:11.595299538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8435051Z 
2025-12-04T10:11:57.8435339Z [W1204 10:00:11.595438960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8435342Z 
2025-12-04T10:11:57.8435626Z [W1204 10:00:11.598361456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8435633Z 
2025-12-04T10:11:57.8435916Z [W1204 10:00:11.598814843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8435953Z 
2025-12-04T10:11:57.8436241Z [W1204 10:00:11.598952105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8436244Z 
2025-12-04T10:11:57.8436531Z [W1204 10:00:11.603521827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8436534Z 
2025-12-04T10:11:57.8436821Z [W1204 10:00:11.603982964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8436824Z 
2025-12-04T10:11:57.8437112Z [W1204 10:00:11.604123236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8437115Z 
2025-12-04T10:11:57.8437178Z FAILED [0.4938s] [100%]
2025-12-04T10:11:57.8437183Z 
2025-12-04T10:11:57.8437270Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8437567Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8437642Z Traceback (most recent call last):
2025-12-04T10:11:57.8437952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8438017Z     method(*args, **kwargs)
2025-12-04T10:11:57.8438311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8438413Z     method(*args, **kwargs)
2025-12-04T10:11:57.8438702Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8438764Z     with policy():
2025-12-04T10:11:57.8439090Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8439156Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8440024Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8440029Z 
2025-12-04T10:11:57.8440163Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8440694Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8440698Z 
2025-12-04T10:11:57.8440859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8440993Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8441089Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8441702Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8441838Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8441899Z graph_break []
2025-12-04T10:11:57.8442023Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8442727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8442798Z   if out == self.unknown_value:
2025-12-04T10:11:57.8443122Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8443196Z Traceback (most recent call last):
2025-12-04T10:11:57.8443502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8443577Z     method(*args, **kwargs)
2025-12-04T10:11:57.8443868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8443932Z     method(*args, **kwargs)
2025-12-04T10:11:57.8444220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8444279Z     with policy():
2025-12-04T10:11:57.8444577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8444643Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8445453Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8445462Z 
2025-12-04T10:11:57.8445588Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8446153Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8446157Z 
2025-12-04T10:11:57.8446321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8446447Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8446579Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8447124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8447252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8447319Z graph_break []
2025-12-04T10:11:57.8447439Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8448132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8448204Z   if out == self.unknown_value:
2025-12-04T10:11:57.8448327Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8448421Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8448544Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8449117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8449176Z graph_break []
2025-12-04T10:11:57.8449260Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8449552Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8449623Z Traceback (most recent call last):
2025-12-04T10:11:57.8449919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8450018Z     method(*args, **kwargs)
2025-12-04T10:11:57.8450306Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8450376Z     method(*args, **kwargs)
2025-12-04T10:11:57.8450667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8450723Z     with policy():
2025-12-04T10:11:57.8451015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8451080Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8451892Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8451899Z 
2025-12-04T10:11:57.8452023Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8452545Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8452552Z 
2025-12-04T10:11:57.8452707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8452827Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8452927Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8453505Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8453660Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8453722Z graph_break []
2025-12-04T10:11:57.8453842Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8454531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8454598Z   if out == self.unknown_value:
2025-12-04T10:11:57.8454718Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8454810Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8454934Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8455475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8455536Z graph_break []
2025-12-04T10:11:57.8455657Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8455801Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8455925Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8456459Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8456519Z graph_break []
2025-12-04T10:11:57.8457011Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml -
2025-12-04T10:11:57.8457113Z =========================== short test summary info ============================
2025-12-04T10:11:57.8458437Z FAILED [0.4938s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8458442Z 
2025-12-04T10:11:57.8458565Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8459080Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8459083Z 
2025-12-04T10:11:57.8459241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8459346Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8459459Z ================== 1 failed, 57 deselected, 2 rerun in 11.91s ==================
2025-12-04T10:11:57.8459518Z Got exit code 1
2025-12-04T10:11:57.8459993Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8460233Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8460546Z W1204 10:00:18.180000 58480 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8460938Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml
2025-12-04T10:11:57.8461073Z ============================= test session starts ==============================
2025-12-04T10:11:57.8461283Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8461350Z cachedir: .pytest_cache
2025-12-04T10:11:57.8461666Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8461744Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8461812Z configfile: pytest.ini
2025-12-04T10:11:57.8462122Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8462249Z collecting ... collected 58 items / 43 deselected / 15 selected
2025-12-04T10:11:57.8462340Z stepcurrent: skipping 43 already run items.
2025-12-04T10:11:57.8462408Z Running 15 items in this shard
2025-12-04T10:11:57.8462411Z 
2025-12-04T10:11:57.8462918Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8663s] [  6%]
2025-12-04T10:11:57.8463450Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4743s] [  6%]
2025-12-04T10:11:57.8463895Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4659s] [  6%]
2025-12-04T10:11:57.8463899Z 
2025-12-04T10:11:57.8463985Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8464280Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8464364Z Traceback (most recent call last):
2025-12-04T10:11:57.8464673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8464777Z     method(*args, **kwargs)
2025-12-04T10:11:57.8465075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8465136Z     method(*args, **kwargs)
2025-12-04T10:11:57.8465423Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8465484Z     with policy():
2025-12-04T10:11:57.8465778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8465845Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8466650Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8466656Z 
2025-12-04T10:11:57.8466787Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8467306Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8467311Z 
2025-12-04T10:11:57.8467467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8467634Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8467729Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8468089Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8468272Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8468330Z graph_break []
2025-12-04T10:11:57.8468624Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8468698Z Traceback (most recent call last):
2025-12-04T10:11:57.8468995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8469065Z     method(*args, **kwargs)
2025-12-04T10:11:57.8469355Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8469422Z     method(*args, **kwargs)
2025-12-04T10:11:57.8469708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8469768Z     with policy():
2025-12-04T10:11:57.8470064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8470129Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8470990Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8470996Z 
2025-12-04T10:11:57.8471124Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8471645Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8471653Z 
2025-12-04T10:11:57.8471807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8471971Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8472065Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8472414Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8472537Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8472600Z graph_break []
2025-12-04T10:11:57.8472722Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8472820Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8472938Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8473278Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8473342Z graph_break []
2025-12-04T10:11:57.8473434Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8473730Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8473807Z Traceback (most recent call last):
2025-12-04T10:11:57.8474107Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8474178Z     method(*args, **kwargs)
2025-12-04T10:11:57.8474503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8474567Z     method(*args, **kwargs)
2025-12-04T10:11:57.8474856Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8474947Z     with policy():
2025-12-04T10:11:57.8475238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8475304Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8476127Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8476131Z 
2025-12-04T10:11:57.8476257Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8476774Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8476778Z 
2025-12-04T10:11:57.8476935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8477059Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8477146Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8477523Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8477646Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8477705Z graph_break []
2025-12-04T10:11:57.8477824Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8477912Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8478036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8478379Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8478472Z graph_break []
2025-12-04T10:11:57.8478596Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8478682Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8478801Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8479137Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8479196Z graph_break []
2025-12-04T10:11:57.8479685Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml -
2025-12-04T10:11:57.8479784Z =========================== short test summary info ============================
2025-12-04T10:11:57.8481133Z FAILED [0.4659s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8481141Z 
2025-12-04T10:11:57.8481264Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8481823Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8481828Z 
2025-12-04T10:11:57.8481983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8482088Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8482243Z ================== 1 failed, 43 deselected, 2 rerun in 2.83s ===================
2025-12-04T10:11:57.8482300Z Got exit code 1
2025-12-04T10:11:57.8482363Z Retrying single test...
2025-12-04T10:11:57.8482631Z W1204 10:00:27.860000 58668 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8483011Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml
2025-12-04T10:11:57.8483107Z ============================= test session starts ==============================
2025-12-04T10:11:57.8483313Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8483377Z cachedir: .pytest_cache
2025-12-04T10:11:57.8483684Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8483762Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8483827Z configfile: pytest.ini
2025-12-04T10:11:57.8484174Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8484305Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8484877Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8484947Z Running 1 items in this shard
2025-12-04T10:11:57.8484951Z 
2025-12-04T10:11:57.8485689Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:29.938355325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8485739Z 
2025-12-04T10:11:57.8486040Z [W1204 10:00:37.879377728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8486045Z 
2025-12-04T10:11:57.8486338Z [W1204 10:00:37.879628212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8486342Z 
2025-12-04T10:11:57.8486626Z [W1204 10:00:37.885654335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8486630Z 
2025-12-04T10:11:57.8486915Z [W1204 10:00:37.886245885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8486923Z 
2025-12-04T10:11:57.8487208Z [W1204 10:00:37.886417048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8487214Z 
2025-12-04T10:11:57.8487498Z [W1204 10:00:37.892095495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8487501Z 
2025-12-04T10:11:57.8487789Z [W1204 10:00:37.892646885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8487793Z 
2025-12-04T10:11:57.8488079Z [W1204 10:00:37.892810487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8488082Z 
2025-12-04T10:11:57.8488164Z ('RERUN', {'yellow': True}) [10.8210s] [100%]
2025-12-04T10:11:57.8488926Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:39.094988004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8488963Z 
2025-12-04T10:11:57.8489254Z [W1204 10:00:39.095513644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8489258Z 
2025-12-04T10:11:57.8489545Z [W1204 10:00:39.095655816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8489549Z 
2025-12-04T10:11:57.8489833Z [W1204 10:00:39.098611577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8489839Z 
2025-12-04T10:11:57.8490127Z [W1204 10:00:39.099169856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8490130Z 
2025-12-04T10:11:57.8490418Z [W1204 10:00:39.099310009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8490423Z 
2025-12-04T10:11:57.8490712Z [W1204 10:00:39.103891207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8490716Z 
2025-12-04T10:11:57.8491034Z [W1204 10:00:39.104365085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8491037Z 
2025-12-04T10:11:57.8491328Z [W1204 10:00:39.104502748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8491331Z 
2025-12-04T10:11:57.8491409Z ('RERUN', {'yellow': True}) [0.4479s] [100%]
2025-12-04T10:11:57.8492132Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:39.540359883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8492170Z 
2025-12-04T10:11:57.8492458Z [W1204 10:00:39.540910543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8492461Z 
2025-12-04T10:11:57.8492752Z [W1204 10:00:39.541047715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8492755Z 
2025-12-04T10:11:57.8493040Z [W1204 10:00:39.543910795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8493043Z 
2025-12-04T10:11:57.8493334Z [W1204 10:00:39.544462234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8493344Z 
2025-12-04T10:11:57.8493630Z [W1204 10:00:39.544599467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8493635Z 
2025-12-04T10:11:57.8493920Z [W1204 10:00:39.549075024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8493923Z 
2025-12-04T10:11:57.8494212Z [W1204 10:00:39.549522841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8494215Z 
2025-12-04T10:11:57.8494501Z [W1204 10:00:39.549656574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8494504Z 
2025-12-04T10:11:57.8494565Z FAILED [0.4432s] [100%]
2025-12-04T10:11:57.8494569Z 
2025-12-04T10:11:57.8494701Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8494997Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8495071Z Traceback (most recent call last):
2025-12-04T10:11:57.8495408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8495477Z     method(*args, **kwargs)
2025-12-04T10:11:57.8495769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8495833Z     method(*args, **kwargs)
2025-12-04T10:11:57.8496130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8496187Z     with policy():
2025-12-04T10:11:57.8496481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8496545Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8497348Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8497355Z 
2025-12-04T10:11:57.8497516Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8498036Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8498039Z 
2025-12-04T10:11:57.8498196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8498326Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8498418Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8498767Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8498928Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8498991Z graph_break []
2025-12-04T10:11:57.8499114Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8499807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8499880Z   if out == self.unknown_value:
2025-12-04T10:11:57.8500174Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8500249Z Traceback (most recent call last):
2025-12-04T10:11:57.8500543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8500618Z     method(*args, **kwargs)
2025-12-04T10:11:57.8500915Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8500977Z     method(*args, **kwargs)
2025-12-04T10:11:57.8501266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8501327Z     with policy():
2025-12-04T10:11:57.8501617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8501684Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8502538Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8502576Z 
2025-12-04T10:11:57.8502703Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8503229Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8503233Z 
2025-12-04T10:11:57.8503386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8503514Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8503606Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8503951Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8504078Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8504135Z graph_break []
2025-12-04T10:11:57.8504264Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8504987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8505072Z   if out == self.unknown_value:
2025-12-04T10:11:57.8505194Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8505283Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8505409Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8505748Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8505808Z graph_break []
2025-12-04T10:11:57.8505892Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8506217Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8506293Z Traceback (most recent call last):
2025-12-04T10:11:57.8506587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8506650Z     method(*args, **kwargs)
2025-12-04T10:11:57.8506940Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8507001Z     method(*args, **kwargs)
2025-12-04T10:11:57.8507297Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8507356Z     with policy():
2025-12-04T10:11:57.8507649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8507719Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8508541Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8508545Z 
2025-12-04T10:11:57.8508673Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8509226Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8509230Z 
2025-12-04T10:11:57.8509389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8509509Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8509631Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8509981Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8510101Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8510157Z graph_break []
2025-12-04T10:11:57.8510278Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8510964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8511033Z   if out == self.unknown_value:
2025-12-04T10:11:57.8511151Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8511240Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8511362Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8511737Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8511798Z graph_break []
2025-12-04T10:11:57.8511917Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8512002Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8512122Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8512460Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8512515Z graph_break []
2025-12-04T10:11:57.8513001Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml -
2025-12-04T10:11:57.8513213Z =========================== short test summary info ============================
2025-12-04T10:11:57.8514513Z FAILED [0.4432s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8514521Z 
2025-12-04T10:11:57.8514644Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8515167Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8515174Z 
2025-12-04T10:11:57.8515326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8515429Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8515548Z ================== 1 failed, 57 deselected, 2 rerun in 11.74s ==================
2025-12-04T10:11:57.8515606Z Got exit code 1
2025-12-04T10:11:57.8515671Z Retrying single test...
2025-12-04T10:11:57.8515969Z W1204 10:00:46.228000 58861 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8516354Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml
2025-12-04T10:11:57.8516448Z ============================= test session starts ==============================
2025-12-04T10:11:57.8516688Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8516757Z cachedir: .pytest_cache
2025-12-04T10:11:57.8517276Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8517354Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8517424Z configfile: pytest.ini
2025-12-04T10:11:57.8517740Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8517868Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8518442Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8518513Z Running 1 items in this shard
2025-12-04T10:11:57.8518519Z 
2025-12-04T10:11:57.8519321Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:47.324713450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8519328Z 
2025-12-04T10:11:57.8519630Z [W1204 10:00:56.362970795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8519633Z 
2025-12-04T10:11:57.8519966Z [W1204 10:00:56.363228080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8519970Z 
2025-12-04T10:11:57.8520258Z [W1204 10:00:56.369128890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8520263Z 
2025-12-04T10:11:57.8520555Z [W1204 10:00:56.369708280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8520630Z 
2025-12-04T10:11:57.8520922Z [W1204 10:00:56.369890383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8520925Z 
2025-12-04T10:11:57.8521212Z [W1204 10:00:56.375390567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8521218Z 
2025-12-04T10:11:57.8521504Z [W1204 10:00:56.375933896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8521507Z 
2025-12-04T10:11:57.8521792Z [W1204 10:00:56.376088048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8521796Z 
2025-12-04T10:11:57.8521879Z ('RERUN', {'yellow': True}) [10.9390s] [100%]
2025-12-04T10:11:57.8522607Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:57.585092379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8522610Z 
2025-12-04T10:11:57.8522908Z [W1204 10:00:57.585628447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8522913Z 
2025-12-04T10:11:57.8523252Z [W1204 10:00:57.585769559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8523256Z 
2025-12-04T10:11:57.8523546Z [W1204 10:00:57.588701926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8523550Z 
2025-12-04T10:11:57.8523880Z [W1204 10:00:57.589269435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8523885Z 
2025-12-04T10:11:57.8524174Z [W1204 10:00:57.589410168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8524177Z 
2025-12-04T10:11:57.8524462Z [W1204 10:00:57.594006280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8524465Z 
2025-12-04T10:11:57.8524749Z [W1204 10:00:57.594473348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8524754Z 
2025-12-04T10:11:57.8525043Z [W1204 10:00:57.594610300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8525046Z 
2025-12-04T10:11:57.8525122Z ('RERUN', {'yellow': True}) [0.4529s] [100%]
2025-12-04T10:11:57.8525889Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:00:58.035982556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8525893Z 
2025-12-04T10:11:57.8526181Z [W1204 10:00:58.036526614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8526184Z 
2025-12-04T10:11:57.8526475Z [W1204 10:00:58.036667017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8526480Z 
2025-12-04T10:11:57.8526764Z [W1204 10:00:58.039550743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8526767Z 
2025-12-04T10:11:57.8527055Z [W1204 10:00:58.040128392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8527101Z 
2025-12-04T10:11:57.8527393Z [W1204 10:00:58.040272584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8527396Z 
2025-12-04T10:11:57.8527681Z [W1204 10:00:58.044736425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8527687Z 
2025-12-04T10:11:57.8527970Z [W1204 10:00:58.045188453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8527973Z 
2025-12-04T10:11:57.8528258Z [W1204 10:00:58.045324695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8528262Z 
2025-12-04T10:11:57.8528325Z FAILED [0.4477s] [100%]
2025-12-04T10:11:57.8528330Z 
2025-12-04T10:11:57.8528417Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8528711Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8528784Z Traceback (most recent call last):
2025-12-04T10:11:57.8529087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8529160Z     method(*args, **kwargs)
2025-12-04T10:11:57.8529452Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8529558Z     method(*args, **kwargs)
2025-12-04T10:11:57.8529849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8529907Z     with policy():
2025-12-04T10:11:57.8530200Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8530302Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8531103Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8531109Z 
2025-12-04T10:11:57.8531238Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8531762Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8531765Z 
2025-12-04T10:11:57.8531924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8532051Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8532148Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8532529Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8532658Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8532718Z graph_break []
2025-12-04T10:11:57.8532838Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8533529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8533601Z   if out == self.unknown_value:
2025-12-04T10:11:57.8533892Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8534007Z Traceback (most recent call last):
2025-12-04T10:11:57.8534302Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8534365Z     method(*args, **kwargs)
2025-12-04T10:11:57.8534659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8534721Z     method(*args, **kwargs)
2025-12-04T10:11:57.8535021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8535083Z     with policy():
2025-12-04T10:11:57.8535371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8535436Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8536264Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8536269Z 
2025-12-04T10:11:57.8536395Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8536913Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8536917Z 
2025-12-04T10:11:57.8537107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8537235Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8537325Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8537709Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8537854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8537914Z graph_break []
2025-12-04T10:11:57.8538039Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8538728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8538799Z   if out == self.unknown_value:
2025-12-04T10:11:57.8538921Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8539009Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8539134Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8539482Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8539574Z graph_break []
2025-12-04T10:11:57.8539663Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8539954Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8540030Z Traceback (most recent call last):
2025-12-04T10:11:57.8540327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8540390Z     method(*args, **kwargs)
2025-12-04T10:11:57.8540683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8540744Z     method(*args, **kwargs)
2025-12-04T10:11:57.8541067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8541126Z     with policy():
2025-12-04T10:11:57.8541418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8541486Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8542306Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8542310Z 
2025-12-04T10:11:57.8542438Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8542957Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8542964Z 
2025-12-04T10:11:57.8543122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8543245Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8543333Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8543676Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8543831Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8543891Z graph_break []
2025-12-04T10:11:57.8544014Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8544697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8544803Z   if out == self.unknown_value:
2025-12-04T10:11:57.8544923Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8545013Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8545137Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8545474Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8545532Z graph_break []
2025-12-04T10:11:57.8545655Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8545740Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8545862Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8546201Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8546312Z graph_break []
2025-12-04T10:11:57.8546806Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml -
2025-12-04T10:11:57.8546907Z =========================== short test summary info ============================
2025-12-04T10:11:57.8548213Z FAILED [0.4477s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8548257Z 
2025-12-04T10:11:57.8548380Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8548903Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8548906Z 
2025-12-04T10:11:57.8549059Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8549164Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8549282Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ==================
2025-12-04T10:11:57.8549340Z Got exit code 1
2025-12-04T10:11:57.8549812Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8550057Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8550316Z W1204 10:01:04.690000 59054 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8550702Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml
2025-12-04T10:11:57.8550796Z ============================= test session starts ==============================
2025-12-04T10:11:57.8551036Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8551102Z cachedir: .pytest_cache
2025-12-04T10:11:57.8551405Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8551529Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8551594Z configfile: pytest.ini
2025-12-04T10:11:57.8551908Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8552037Z collecting ... collected 58 items / 44 deselected / 14 selected
2025-12-04T10:11:57.8552123Z stepcurrent: skipping 44 already run items.
2025-12-04T10:11:57.8552197Z Running 14 items in this shard
2025-12-04T10:11:57.8552203Z 
2025-12-04T10:11:57.8552700Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8805s] [  7%]
2025-12-04T10:11:57.8553182Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4589s] [  7%]
2025-12-04T10:11:57.8553622Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4604s] [  7%]
2025-12-04T10:11:57.8553628Z 
2025-12-04T10:11:57.8553751Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8554049Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8554122Z Traceback (most recent call last):
2025-12-04T10:11:57.8554428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8554496Z     method(*args, **kwargs)
2025-12-04T10:11:57.8554787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8554850Z     method(*args, **kwargs)
2025-12-04T10:11:57.8555137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8555232Z     with policy():
2025-12-04T10:11:57.8555528Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8555596Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8556387Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8556393Z 
2025-12-04T10:11:57.8556519Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8557035Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8557042Z 
2025-12-04T10:11:57.8557200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8557326Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8557421Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8557766Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8557892Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8557952Z graph_break []
2025-12-04T10:11:57.8558273Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8558348Z Traceback (most recent call last):
2025-12-04T10:11:57.8558645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8558740Z     method(*args, **kwargs)
2025-12-04T10:11:57.8559031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8559092Z     method(*args, **kwargs)
2025-12-04T10:11:57.8559377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8559437Z     with policy():
2025-12-04T10:11:57.8559725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8559792Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8560662Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8560671Z 
2025-12-04T10:11:57.8560795Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8561354Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8561358Z 
2025-12-04T10:11:57.8561513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8561639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8561730Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8562071Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8562196Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8562288Z graph_break []
2025-12-04T10:11:57.8562410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8562502Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8562624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8562979Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8563040Z graph_break []
2025-12-04T10:11:57.8563125Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8563416Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8563488Z Traceback (most recent call last):
2025-12-04T10:11:57.8563787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8563857Z     method(*args, **kwargs)
2025-12-04T10:11:57.8564144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8564211Z     method(*args, **kwargs)
2025-12-04T10:11:57.8564496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8564555Z     with policy():
2025-12-04T10:11:57.8564846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8564910Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8565764Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8565801Z 
2025-12-04T10:11:57.8565928Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8566448Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8566452Z 
2025-12-04T10:11:57.8566608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8566732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8566830Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8567183Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8567311Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8567374Z graph_break []
2025-12-04T10:11:57.8567493Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8567582Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8567745Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8568084Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8568145Z graph_break []
2025-12-04T10:11:57.8568265Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8568354Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8568472Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8568805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8568903Z graph_break []
2025-12-04T10:11:57.8569390Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml -
2025-12-04T10:11:57.8569491Z =========================== short test summary info ============================
2025-12-04T10:11:57.8570771Z FAILED [0.4604s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8570776Z 
2025-12-04T10:11:57.8570903Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8571422Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8571426Z 
2025-12-04T10:11:57.8571581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8571686Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8571800Z ================== 1 failed, 44 deselected, 2 rerun in 2.82s ===================
2025-12-04T10:11:57.8571860Z Got exit code 1
2025-12-04T10:11:57.8571957Z Retrying single test...
2025-12-04T10:11:57.8572231Z W1204 10:01:14.338000 59235 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8572620Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml
2025-12-04T10:11:57.8572767Z ============================= test session starts ==============================
2025-12-04T10:11:57.8572983Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8573051Z cachedir: .pytest_cache
2025-12-04T10:11:57.8573362Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8573441Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8573503Z configfile: pytest.ini
2025-12-04T10:11:57.8573823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8573956Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8574532Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8574608Z Running 1 items in this shard
2025-12-04T10:11:57.8574612Z 
2025-12-04T10:11:57.8575375Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:15.390594130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8575381Z 
2025-12-04T10:11:57.8575685Z [W1204 10:01:24.553909126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8575692Z 
2025-12-04T10:11:57.8575985Z [W1204 10:01:24.554198271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8575988Z 
2025-12-04T10:11:57.8576275Z [W1204 10:01:24.559967450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8576313Z 
2025-12-04T10:11:57.8576607Z [W1204 10:01:24.560591811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8576610Z 
2025-12-04T10:11:57.8576898Z [W1204 10:01:24.560778714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8576901Z 
2025-12-04T10:11:57.8577192Z [W1204 10:01:24.566218548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8577197Z 
2025-12-04T10:11:57.8577482Z [W1204 10:01:24.566763667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8577486Z 
2025-12-04T10:11:57.8577775Z [W1204 10:01:24.566939020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8577780Z 
2025-12-04T10:11:57.8577861Z ('RERUN', {'yellow': True}) [11.0205s] [100%]
2025-12-04T10:11:57.8578585Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:25.738889840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8578589Z 
2025-12-04T10:11:57.8578874Z [W1204 10:01:25.739449529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8578914Z 
2025-12-04T10:11:57.8579200Z [W1204 10:01:25.739592461 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8579206Z 
2025-12-04T10:11:57.8579487Z [W1204 10:01:25.742609773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8579525Z 
2025-12-04T10:11:57.8579811Z [W1204 10:01:25.743182543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8579814Z 
2025-12-04T10:11:57.8580115Z [W1204 10:01:25.743323826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8580118Z 
2025-12-04T10:11:57.8580402Z [W1204 10:01:25.747902335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8580405Z 
2025-12-04T10:11:57.8580693Z [W1204 10:01:25.748373163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8580696Z 
2025-12-04T10:11:57.8580979Z [W1204 10:01:25.748513095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8580986Z 
2025-12-04T10:11:57.8581066Z ('RERUN', {'yellow': True}) [0.4208s] [100%]
2025-12-04T10:11:57.8581818Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:26.157378251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8581822Z 
2025-12-04T10:11:57.8582114Z [W1204 10:01:26.157916600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8582117Z 
2025-12-04T10:11:57.8582416Z [W1204 10:01:26.158059193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8582419Z 
2025-12-04T10:11:57.8582709Z [W1204 10:01:26.161015894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8582764Z 
2025-12-04T10:11:57.8583052Z [W1204 10:01:26.161565463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8583057Z 
2025-12-04T10:11:57.8583343Z [W1204 10:01:26.161704656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8583346Z 
2025-12-04T10:11:57.8583634Z [W1204 10:01:26.166186573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8583638Z 
2025-12-04T10:11:57.8583924Z [W1204 10:01:26.166649351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8583927Z 
2025-12-04T10:11:57.8584216Z [W1204 10:01:26.166785503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8584223Z 
2025-12-04T10:11:57.8584284Z FAILED [0.4136s] [100%]
2025-12-04T10:11:57.8584287Z 
2025-12-04T10:11:57.8584372Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8584673Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8584747Z Traceback (most recent call last):
2025-12-04T10:11:57.8585061Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8585124Z     method(*args, **kwargs)
2025-12-04T10:11:57.8585456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8585523Z     method(*args, **kwargs)
2025-12-04T10:11:57.8585811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8585908Z     with policy():
2025-12-04T10:11:57.8586202Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8586267Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8587072Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8587077Z 
2025-12-04T10:11:57.8587206Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8587729Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8587735Z 
2025-12-04T10:11:57.8587894Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8588032Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8588166Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8588519Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8588648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8588704Z graph_break []
2025-12-04T10:11:57.8588827Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8589524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8589629Z   if out == self.unknown_value:
2025-12-04T10:11:57.8589925Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8589998Z Traceback (most recent call last):
2025-12-04T10:11:57.8590296Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8590362Z     method(*args, **kwargs)
2025-12-04T10:11:57.8590650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8590713Z     method(*args, **kwargs)
2025-12-04T10:11:57.8591003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8591060Z     with policy():
2025-12-04T10:11:57.8591369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8591439Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8592251Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8592258Z 
2025-12-04T10:11:57.8592383Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8592937Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8592941Z 
2025-12-04T10:11:57.8593106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8593230Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8593361Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8593714Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8593840Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8593900Z graph_break []
2025-12-04T10:11:57.8594021Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8594712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8594783Z   if out == self.unknown_value:
2025-12-04T10:11:57.8594903Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8594999Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8595121Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8595499Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8595570Z graph_break []
2025-12-04T10:11:57.8595653Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8595945Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8596016Z Traceback (most recent call last):
2025-12-04T10:11:57.8596313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8596379Z     method(*args, **kwargs)
2025-12-04T10:11:57.8596673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8596768Z     method(*args, **kwargs)
2025-12-04T10:11:57.8597063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8597120Z     with policy():
2025-12-04T10:11:57.8597417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8597485Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8598295Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8598299Z 
2025-12-04T10:11:57.8598426Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8598946Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8598950Z 
2025-12-04T10:11:57.8599104Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8599226Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8599313Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8599711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8599833Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8599928Z graph_break []
2025-12-04T10:11:57.8600052Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8600776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8600848Z   if out == self.unknown_value:
2025-12-04T10:11:57.8600979Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8601072Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8601193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8601533Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8601592Z graph_break []
2025-12-04T10:11:57.8601710Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8601801Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8601921Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8602300Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8602362Z graph_break []
2025-12-04T10:11:57.8602849Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml -
2025-12-04T10:11:57.8602949Z =========================== short test summary info ============================
2025-12-04T10:11:57.8604232Z FAILED [0.4136s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8604274Z 
2025-12-04T10:11:57.8604399Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8604915Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8604919Z 
2025-12-04T10:11:57.8605073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8605180Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8605294Z ================== 1 failed, 57 deselected, 2 rerun in 11.88s ==================
2025-12-04T10:11:57.8605352Z Got exit code 1
2025-12-04T10:11:57.8605421Z Retrying single test...
2025-12-04T10:11:57.8605681Z W1204 10:01:32.812000 59421 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8606071Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml
2025-12-04T10:11:57.8606164Z ============================= test session starts ==============================
2025-12-04T10:11:57.8606376Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8606443Z cachedir: .pytest_cache
2025-12-04T10:11:57.8606783Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8606859Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8606925Z configfile: pytest.ini
2025-12-04T10:11:57.8607250Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8607419Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8607993Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8608062Z Running 1 items in this shard
2025-12-04T10:11:57.8608065Z 
2025-12-04T10:11:57.8608793Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:33.859292993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8608797Z 
2025-12-04T10:11:57.8609095Z [W1204 10:01:42.890559173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8609101Z 
2025-12-04T10:11:57.8609399Z [W1204 10:01:42.890797517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8609402Z 
2025-12-04T10:11:57.8609729Z [W1204 10:01:42.896597156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8609733Z 
2025-12-04T10:11:57.8610027Z [W1204 10:01:42.897143645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8610030Z 
2025-12-04T10:11:57.8610317Z [W1204 10:01:42.897327789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8610320Z 
2025-12-04T10:11:57.8610607Z [W1204 10:01:42.902841603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8610612Z 
2025-12-04T10:11:57.8610931Z [W1204 10:01:42.903391293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8610935Z 
2025-12-04T10:11:57.8611225Z [W1204 10:01:42.903564586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8611228Z 
2025-12-04T10:11:57.8611309Z ('RERUN', {'yellow': True}) [10.8872s] [100%]
2025-12-04T10:11:57.8612028Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:44.080884402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8612033Z 
2025-12-04T10:11:57.8612326Z [W1204 10:01:44.081435572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8612333Z 
2025-12-04T10:11:57.8612618Z [W1204 10:01:44.081578694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8612621Z 
2025-12-04T10:11:57.8612915Z [W1204 10:01:44.084549815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8612918Z 
2025-12-04T10:11:57.8613205Z [W1204 10:01:44.085119195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8613208Z 
2025-12-04T10:11:57.8613531Z [W1204 10:01:44.085262037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8613534Z 
2025-12-04T10:11:57.8613820Z [W1204 10:01:44.089829805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8613856Z 
2025-12-04T10:11:57.8614147Z [W1204 10:01:44.090319683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8614151Z 
2025-12-04T10:11:57.8614438Z [W1204 10:01:44.090460826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8614442Z 
2025-12-04T10:11:57.8614523Z ('RERUN', {'yellow': True}) [0.4190s] [100%]
2025-12-04T10:11:57.8615243Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:01:44.494546687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8615247Z 
2025-12-04T10:11:57.8615535Z [W1204 10:01:44.495114257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8615540Z 
2025-12-04T10:11:57.8615829Z [W1204 10:01:44.495256889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8615832Z 
2025-12-04T10:11:57.8616151Z [W1204 10:01:44.498258410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8616155Z 
2025-12-04T10:11:57.8616449Z [W1204 10:01:44.498823130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8616452Z 
2025-12-04T10:11:57.8616741Z [W1204 10:01:44.498960733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8616744Z 
2025-12-04T10:11:57.8617185Z [W1204 10:01:44.503576872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8617190Z 
2025-12-04T10:11:57.8617485Z [W1204 10:01:44.504035090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8617551Z 
2025-12-04T10:11:57.8617846Z [W1204 10:01:44.504169742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8617849Z 
2025-12-04T10:11:57.8617911Z FAILED [0.4135s] [100%]
2025-12-04T10:11:57.8617915Z 
2025-12-04T10:11:57.8617998Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8618294Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8618369Z Traceback (most recent call last):
2025-12-04T10:11:57.8618686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8618750Z     method(*args, **kwargs)
2025-12-04T10:11:57.8619042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8619116Z     method(*args, **kwargs)
2025-12-04T10:11:57.8619416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8619475Z     with policy():
2025-12-04T10:11:57.8619771Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8619836Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8620685Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8620689Z 
2025-12-04T10:11:57.8620861Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8621379Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8621385Z 
2025-12-04T10:11:57.8621540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8621666Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8621761Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8622109Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8622235Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8622296Z graph_break []
2025-12-04T10:11:57.8622417Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8623242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8623313Z   if out == self.unknown_value:
2025-12-04T10:11:57.8623602Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8623674Z Traceback (most recent call last):
2025-12-04T10:11:57.8623973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8624037Z     method(*args, **kwargs)
2025-12-04T10:11:57.8624329Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8624390Z     method(*args, **kwargs)
2025-12-04T10:11:57.8624746Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8624803Z     with policy():
2025-12-04T10:11:57.8625098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8625168Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8625974Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8625979Z 
2025-12-04T10:11:57.8626104Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8626618Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8626625Z 
2025-12-04T10:11:57.8626781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8626905Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8626997Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8627354Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8627480Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8627817Z graph_break []
2025-12-04T10:11:57.8627953Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8628642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8628751Z   if out == self.unknown_value:
2025-12-04T10:11:57.8628874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8628965Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8629092Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8629510Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8629573Z graph_break []
2025-12-04T10:11:57.8629675Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8629966Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8630045Z Traceback (most recent call last):
2025-12-04T10:11:57.8630340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8630401Z     method(*args, **kwargs)
2025-12-04T10:11:57.8630727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8630789Z     method(*args, **kwargs)
2025-12-04T10:11:57.8631081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8631138Z     with policy():
2025-12-04T10:11:57.8631429Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8631496Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8632302Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8632342Z 
2025-12-04T10:11:57.8632470Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8632986Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8632990Z 
2025-12-04T10:11:57.8633147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8633270Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8633369Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8633715Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8633839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8633897Z graph_break []
2025-12-04T10:11:57.8634020Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8634704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8634773Z   if out == self.unknown_value:
2025-12-04T10:11:57.8634929Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8635017Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8635143Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8635483Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8635577Z graph_break []
2025-12-04T10:11:57.8635701Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8635788Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8635907Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8636244Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8636302Z graph_break []
2025-12-04T10:11:57.8636794Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml -
2025-12-04T10:11:57.8636893Z =========================== short test summary info ============================
2025-12-04T10:11:57.8638217Z FAILED [0.4135s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8638222Z 
2025-12-04T10:11:57.8638348Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8638865Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8638868Z 
2025-12-04T10:11:57.8639022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8639160Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8639278Z ================== 1 failed, 57 deselected, 2 rerun in 11.74s ==================
2025-12-04T10:11:57.8639336Z Got exit code 1
2025-12-04T10:11:57.8639807Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8640105Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8640369Z W1204 10:01:51.166000 59607 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8640754Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml
2025-12-04T10:11:57.8640848Z ============================= test session starts ==============================
2025-12-04T10:11:57.8641059Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8641124Z cachedir: .pytest_cache
2025-12-04T10:11:57.8641430Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8641506Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8641571Z configfile: pytest.ini
2025-12-04T10:11:57.8641888Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8642054Z collecting ... collected 58 items / 45 deselected / 13 selected
2025-12-04T10:11:57.8642144Z stepcurrent: skipping 45 already run items.
2025-12-04T10:11:57.8642216Z Running 13 items in this shard
2025-12-04T10:11:57.8642220Z 
2025-12-04T10:11:57.8642712Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9207s] [  7%]
2025-12-04T10:11:57.8643237Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5308s] [  7%]
2025-12-04T10:11:57.8643677Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.5265s] [  7%]
2025-12-04T10:11:57.8643680Z 
2025-12-04T10:11:57.8643761Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8644067Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8644142Z Traceback (most recent call last):
2025-12-04T10:11:57.8644449Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8644516Z     method(*args, **kwargs)
2025-12-04T10:11:57.8644846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8644913Z     method(*args, **kwargs)
2025-12-04T10:11:57.8645205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8645262Z     with policy():
2025-12-04T10:11:57.8645560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8645626Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8646428Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8646468Z 
2025-12-04T10:11:57.8646594Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8647115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8647119Z 
2025-12-04T10:11:57.8647274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8647399Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8647492Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8648035Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8648167Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8648224Z graph_break []
2025-12-04T10:11:57.8648513Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8648590Z Traceback (most recent call last):
2025-12-04T10:11:57.8648887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8648947Z     method(*args, **kwargs)
2025-12-04T10:11:57.8649282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8649345Z     method(*args, **kwargs)
2025-12-04T10:11:57.8649640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8649697Z     with policy():
2025-12-04T10:11:57.8650038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8650107Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8650912Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8650917Z 
2025-12-04T10:11:57.8651039Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8651551Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8651555Z 
2025-12-04T10:11:57.8651707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8651835Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8651924Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8652516Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8652641Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8652698Z graph_break []
2025-12-04T10:11:57.8652827Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8652914Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8653036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8653569Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8653673Z graph_break []
2025-12-04T10:11:57.8653761Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8654048Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8654119Z Traceback (most recent call last):
2025-12-04T10:11:57.8654418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8654482Z     method(*args, **kwargs)
2025-12-04T10:11:57.8654777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8654839Z     method(*args, **kwargs)
2025-12-04T10:11:57.8655126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8655192Z     with policy():
2025-12-04T10:11:57.8655485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8655555Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8656363Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8656403Z 
2025-12-04T10:11:57.8656528Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8657045Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8657083Z 
2025-12-04T10:11:57.8657237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8657367Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8657454Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8657993Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8658117Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8658174Z graph_break []
2025-12-04T10:11:57.8658299Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8658385Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8658504Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8659073Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8659131Z graph_break []
2025-12-04T10:11:57.8659256Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8659340Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8659457Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8659999Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8660063Z graph_break []
2025-12-04T10:11:57.8660552Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml -
2025-12-04T10:11:57.8660693Z =========================== short test summary info ============================
2025-12-04T10:11:57.8661977Z FAILED [0.5265s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8661985Z 
2025-12-04T10:11:57.8662106Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8662620Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8662626Z 
2025-12-04T10:11:57.8662784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8662888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8663004Z ================== 1 failed, 45 deselected, 2 rerun in 3.00s ===================
2025-12-04T10:11:57.8663062Z Got exit code 1
2025-12-04T10:11:57.8663125Z Retrying single test...
2025-12-04T10:11:57.8663421Z W1204 10:02:00.782000 59789 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8663803Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml
2025-12-04T10:11:57.8663895Z ============================= test session starts ==============================
2025-12-04T10:11:57.8664138Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8664202Z cachedir: .pytest_cache
2025-12-04T10:11:57.8664514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8664587Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8664651Z configfile: pytest.ini
2025-12-04T10:11:57.8664966Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8665095Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8665662Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8665735Z Running 1 items in this shard
2025-12-04T10:11:57.8665740Z 
2025-12-04T10:11:57.8666500Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:02.358959509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8666505Z 
2025-12-04T10:11:57.8666805Z [W1204 10:02:11.221429124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8666808Z 
2025-12-04T10:11:57.8667100Z [W1204 10:02:11.221698429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8667103Z 
2025-12-04T10:11:57.8667396Z [W1204 10:02:11.227782462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8667400Z 
2025-12-04T10:11:57.8667722Z [W1204 10:02:11.228396562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8667725Z 
2025-12-04T10:11:57.8668021Z [W1204 10:02:11.228585156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8668026Z 
2025-12-04T10:11:57.8668320Z [W1204 10:02:11.234108710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8668323Z 
2025-12-04T10:11:57.8668611Z [W1204 10:02:11.234659119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8668615Z 
2025-12-04T10:11:57.8668901Z [W1204 10:02:11.234828582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8668905Z 
2025-12-04T10:11:57.8668983Z ('RERUN', {'yellow': True}) [10.7946s] [100%]
2025-12-04T10:11:57.8669717Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:12.028548844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8669721Z 
2025-12-04T10:11:57.8670007Z [W1204 10:02:12.029077823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8670010Z 
2025-12-04T10:11:57.8670333Z [W1204 10:02:12.029220096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8670337Z 
2025-12-04T10:11:57.8670623Z [W1204 10:02:12.032160845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8670628Z 
2025-12-04T10:11:57.8670947Z [W1204 10:02:12.032617723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8670952Z 
2025-12-04T10:11:57.8671241Z [W1204 10:02:12.032754675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8671244Z 
2025-12-04T10:11:57.8671533Z [W1204 10:02:12.037228941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8671536Z 
2025-12-04T10:11:57.8671824Z [W1204 10:02:12.037677568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8671827Z 
2025-12-04T10:11:57.8672118Z [W1204 10:02:12.037815561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8672121Z 
2025-12-04T10:11:57.8672198Z ('RERUN', {'yellow': True}) [0.4937s] [100%]
2025-12-04T10:11:57.8672955Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:12.521523918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8672959Z 
2025-12-04T10:11:57.8673252Z [W1204 10:02:12.522047657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8673255Z 
2025-12-04T10:11:57.8673546Z [W1204 10:02:12.522188479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8673549Z 
2025-12-04T10:11:57.8673837Z [W1204 10:02:12.525072228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8673840Z 
2025-12-04T10:11:57.8674128Z [W1204 10:02:12.525517825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8674165Z 
2025-12-04T10:11:57.8674455Z [W1204 10:02:12.525656498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8674458Z 
2025-12-04T10:11:57.8674742Z [W1204 10:02:12.530153474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8674745Z 
2025-12-04T10:11:57.8675037Z [W1204 10:02:12.530610562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8675044Z 
2025-12-04T10:11:57.8675338Z [W1204 10:02:12.530748014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8675341Z 
2025-12-04T10:11:57.8675401Z FAILED [0.4904s] [100%]
2025-12-04T10:11:57.8675406Z 
2025-12-04T10:11:57.8675494Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8675786Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8675862Z Traceback (most recent call last):
2025-12-04T10:11:57.8676164Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8676227Z     method(*args, **kwargs)
2025-12-04T10:11:57.8676523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8676623Z     method(*args, **kwargs)
2025-12-04T10:11:57.8676916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8676975Z     with policy():
2025-12-04T10:11:57.8677269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8677391Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8678190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8678194Z 
2025-12-04T10:11:57.8678323Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8678853Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8678857Z 
2025-12-04T10:11:57.8679017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8679148Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8679240Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8679839Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8680006Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8680065Z graph_break []
2025-12-04T10:11:57.8680189Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8680883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8680957Z   if out == self.unknown_value:
2025-12-04T10:11:57.8681285Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8681357Z Traceback (most recent call last):
2025-12-04T10:11:57.8681659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8681720Z     method(*args, **kwargs)
2025-12-04T10:11:57.8682008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8682076Z     method(*args, **kwargs)
2025-12-04T10:11:57.8682364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8682425Z     with policy():
2025-12-04T10:11:57.8682716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8682783Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8683593Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8683597Z 
2025-12-04T10:11:57.8683720Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8684279Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8684284Z 
2025-12-04T10:11:57.8684440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8684567Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8684692Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8685241Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8685369Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8685425Z graph_break []
2025-12-04T10:11:57.8685545Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8686235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8686305Z   if out == self.unknown_value:
2025-12-04T10:11:57.8686427Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8686519Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8686639Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8687213Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8687272Z graph_break []
2025-12-04T10:11:57.8687359Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8687649Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8687721Z Traceback (most recent call last):
2025-12-04T10:11:57.8688020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8688090Z     method(*args, **kwargs)
2025-12-04T10:11:57.8688418Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8688496Z     method(*args, **kwargs)
2025-12-04T10:11:57.8688787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8688849Z     with policy():
2025-12-04T10:11:57.8689140Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8689206Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8690020Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8690027Z 
2025-12-04T10:11:57.8690149Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8690667Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8690670Z 
2025-12-04T10:11:57.8690822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8690947Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8691050Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8691844Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8691985Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8692082Z graph_break []
2025-12-04T10:11:57.8692210Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8692901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8692969Z   if out == self.unknown_value:
2025-12-04T10:11:57.8693095Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8693186Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8693308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8693847Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8693907Z graph_break []
2025-12-04T10:11:57.8694031Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8694164Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8694288Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8694828Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8694886Z graph_break []
2025-12-04T10:11:57.8695376Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml -
2025-12-04T10:11:57.8695476Z =========================== short test summary info ============================
2025-12-04T10:11:57.8696800Z FAILED [0.4904s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8696808Z 
2025-12-04T10:11:57.8696935Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8697452Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8697457Z 
2025-12-04T10:11:57.8697621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8697727Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8697843Z ================== 1 failed, 57 deselected, 2 rerun in 11.80s ==================
2025-12-04T10:11:57.8697902Z Got exit code 1
2025-12-04T10:11:57.8697965Z Retrying single test...
2025-12-04T10:11:57.8698243Z W1204 10:02:19.180000 59975 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8698623Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml
2025-12-04T10:11:57.8698750Z ============================= test session starts ==============================
2025-12-04T10:11:57.8698962Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8699028Z cachedir: .pytest_cache
2025-12-04T10:11:57.8699332Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8699443Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8699508Z configfile: pytest.ini
2025-12-04T10:11:57.8699843Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8699975Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8700544Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8700616Z Running 1 items in this shard
2025-12-04T10:11:57.8700620Z 
2025-12-04T10:11:57.8701344Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:20.767520727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8701356Z 
2025-12-04T10:11:57.8701691Z [W1204 10:02:29.889076346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8701695Z 
2025-12-04T10:11:57.8701989Z [W1204 10:02:29.889348501 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8701992Z 
2025-12-04T10:11:57.8702284Z [W1204 10:02:29.895409844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8702289Z 
2025-12-04T10:11:57.8702574Z [W1204 10:02:29.896010345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8702577Z 
2025-12-04T10:11:57.8702866Z [W1204 10:02:29.896190538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8702905Z 
2025-12-04T10:11:57.8703198Z [W1204 10:02:29.901672291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8703201Z 
2025-12-04T10:11:57.8703492Z [W1204 10:02:29.902209240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8703495Z 
2025-12-04T10:11:57.8703780Z [W1204 10:02:29.902366433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8703783Z 
2025-12-04T10:11:57.8703872Z ('RERUN', {'yellow': True}) [11.0649s] [100%]
2025-12-04T10:11:57.8704590Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:30.704441237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8704597Z 
2025-12-04T10:11:57.8704885Z [W1204 10:02:30.704974946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8704888Z 
2025-12-04T10:11:57.8705179Z [W1204 10:02:30.705111959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8705182Z 
2025-12-04T10:11:57.8705470Z [W1204 10:02:30.707925417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8705473Z 
2025-12-04T10:11:57.8705817Z [W1204 10:02:30.708378405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8705821Z 
2025-12-04T10:11:57.8706108Z [W1204 10:02:30.708513837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8706147Z 
2025-12-04T10:11:57.8706439Z [W1204 10:02:30.713026294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8706444Z 
2025-12-04T10:11:57.8706730Z [W1204 10:02:30.713477582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8706734Z 
2025-12-04T10:11:57.8707023Z [W1204 10:02:30.713615894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8707026Z 
2025-12-04T10:11:57.8707106Z ('RERUN', {'yellow': True}) [0.4995s] [100%]
2025-12-04T10:11:57.8707826Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:02:31.200790172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8707836Z 
2025-12-04T10:11:57.8708163Z [W1204 10:02:31.201334241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8708167Z 
2025-12-04T10:11:57.8708453Z [W1204 10:02:31.201473494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8708456Z 
2025-12-04T10:11:57.8708747Z [W1204 10:02:31.204290682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8708750Z 
2025-12-04T10:11:57.8709040Z [W1204 10:02:31.204743759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8709043Z 
2025-12-04T10:11:57.8709337Z [W1204 10:02:31.204883702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8709375Z 
2025-12-04T10:11:57.8709663Z [W1204 10:02:31.209247446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8709668Z 
2025-12-04T10:11:57.8709956Z [W1204 10:02:31.209693114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8709960Z 
2025-12-04T10:11:57.8710244Z [W1204 10:02:31.209827976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8710247Z 
2025-12-04T10:11:57.8710309Z FAILED [0.4949s] [100%]
2025-12-04T10:11:57.8710314Z 
2025-12-04T10:11:57.8710395Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8710685Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8710765Z Traceback (most recent call last):
2025-12-04T10:11:57.8711077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8711144Z     method(*args, **kwargs)
2025-12-04T10:11:57.8711442Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8711504Z     method(*args, **kwargs)
2025-12-04T10:11:57.8711802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8711861Z     with policy():
2025-12-04T10:11:57.8712191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8712264Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8713058Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8713099Z 
2025-12-04T10:11:57.8713232Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8713758Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8713762Z 
2025-12-04T10:11:57.8713923Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8714053Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8714152Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8714705Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8714839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8714932Z graph_break []
2025-12-04T10:11:57.8715066Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8715756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8715834Z   if out == self.unknown_value:
2025-12-04T10:11:57.8716128Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8716203Z Traceback (most recent call last):
2025-12-04T10:11:57.8716510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8716608Z     method(*args, **kwargs)
2025-12-04T10:11:57.8716904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8716965Z     method(*args, **kwargs)
2025-12-04T10:11:57.8717414Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8717479Z     with policy():
2025-12-04T10:11:57.8717777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8717842Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8718654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8718661Z 
2025-12-04T10:11:57.8718790Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8719309Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8719312Z 
2025-12-04T10:11:57.8719474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8719606Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8719771Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8720405Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8720593Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8720652Z graph_break []
2025-12-04T10:11:57.8720782Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8721467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8721536Z   if out == self.unknown_value:
2025-12-04T10:11:57.8721664Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8721757Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8721880Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8722425Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8722485Z graph_break []
2025-12-04T10:11:57.8722632Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8722929Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8723001Z Traceback (most recent call last):
2025-12-04T10:11:57.8723305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8723369Z     method(*args, **kwargs)
2025-12-04T10:11:57.8723662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8723724Z     method(*args, **kwargs)
2025-12-04T10:11:57.8724016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8724125Z     with policy():
2025-12-04T10:11:57.8724421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8724487Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8725297Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8725301Z 
2025-12-04T10:11:57.8725426Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8725947Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8725954Z 
2025-12-04T10:11:57.8726111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8726240Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8726330Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8726871Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8727033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8727091Z graph_break []
2025-12-04T10:11:57.8727218Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8727901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8728005Z   if out == self.unknown_value:
2025-12-04T10:11:57.8728131Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8728220Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8728347Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8728888Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8728945Z graph_break []
2025-12-04T10:11:57.8729068Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8729159Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8729279Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8729851Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8729910Z graph_break []
2025-12-04T10:11:57.8730403Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml -
2025-12-04T10:11:57.8730512Z =========================== short test summary info ============================
2025-12-04T10:11:57.8731797Z FAILED [0.4949s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8731854Z 
2025-12-04T10:11:57.8731978Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8732492Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8732499Z 
2025-12-04T10:11:57.8732656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8732759Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8732877Z ================== 1 failed, 57 deselected, 2 rerun in 12.08s ==================
2025-12-04T10:11:57.8732935Z Got exit code 1
2025-12-04T10:11:57.8733406Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8733652Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8733916Z W1204 10:02:37.864000 60162 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8734305Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml
2025-12-04T10:11:57.8734434Z ============================= test session starts ==============================
2025-12-04T10:11:57.8734645Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8734713Z cachedir: .pytest_cache
2025-12-04T10:11:57.8735135Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8735221Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8735286Z configfile: pytest.ini
2025-12-04T10:11:57.8735603Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8735735Z collecting ... collected 58 items / 46 deselected / 12 selected
2025-12-04T10:11:57.8735822Z stepcurrent: skipping 46 already run items.
2025-12-04T10:11:57.8735891Z Running 12 items in this shard
2025-12-04T10:11:57.8735895Z 
2025-12-04T10:11:57.8736406Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8851s] [  8%]
2025-12-04T10:11:57.8736894Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5062s] [  8%]
2025-12-04T10:11:57.8737387Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5022s] [  8%]
2025-12-04T10:11:57.8737391Z 
2025-12-04T10:11:57.8737472Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8737773Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8737852Z Traceback (most recent call last):
2025-12-04T10:11:57.8738161Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8738226Z     method(*args, **kwargs)
2025-12-04T10:11:57.8738522Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8738620Z     method(*args, **kwargs)
2025-12-04T10:11:57.8738911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8738971Z     with policy():
2025-12-04T10:11:57.8739269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8739334Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8740140Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8740148Z 
2025-12-04T10:11:57.8740269Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8740792Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8740797Z 
2025-12-04T10:11:57.8740964Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8741087Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8741186Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8741537Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8741696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8741759Z graph_break []
2025-12-04T10:11:57.8742051Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8742157Z Traceback (most recent call last):
2025-12-04T10:11:57.8742458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8742521Z     method(*args, **kwargs)
2025-12-04T10:11:57.8742815Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8742876Z     method(*args, **kwargs)
2025-12-04T10:11:57.8743163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8743223Z     with policy():
2025-12-04T10:11:57.8743520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8743583Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8744409Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8744417Z 
2025-12-04T10:11:57.8744572Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8745098Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8745102Z 
2025-12-04T10:11:57.8745257Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8745391Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8745488Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8745833Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8746000Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8746060Z graph_break []
2025-12-04T10:11:57.8746184Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8746272Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8746390Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8746735Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8746795Z graph_break []
2025-12-04T10:11:57.8746878Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8747170Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8747244Z Traceback (most recent call last):
2025-12-04T10:11:57.8747543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8747606Z     method(*args, **kwargs)
2025-12-04T10:11:57.8747898Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8747966Z     method(*args, **kwargs)
2025-12-04T10:11:57.8748257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8748316Z     with policy():
2025-12-04T10:11:57.8748645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8748711Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8749539Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8749579Z 
2025-12-04T10:11:57.8749702Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8750223Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8750227Z 
2025-12-04T10:11:57.8750387Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8750510Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8750603Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8750943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8751070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8751129Z graph_break []
2025-12-04T10:11:57.8751289Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8751382Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8751500Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8751840Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8751902Z graph_break []
2025-12-04T10:11:57.8752022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8752113Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8752247Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8752622Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8752684Z graph_break []
2025-12-04T10:11:57.8753166Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml -
2025-12-04T10:11:57.8753266Z =========================== short test summary info ============================
2025-12-04T10:11:57.8754564Z FAILED [0.5022s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8754571Z 
2025-12-04T10:11:57.8754692Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8755213Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8755217Z 
2025-12-04T10:11:57.8755367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8755509Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8755627Z ================== 1 failed, 46 deselected, 2 rerun in 2.92s ===================
2025-12-04T10:11:57.8755683Z Got exit code 1
2025-12-04T10:11:57.8755752Z Retrying single test...
2025-12-04T10:11:57.8756010Z W1204 10:02:47.569000 60351 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8756430Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml
2025-12-04T10:11:57.8756525Z ============================= test session starts ==============================
2025-12-04T10:11:57.8756730Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8756812Z cachedir: .pytest_cache
2025-12-04T10:11:57.8757121Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8757202Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8757267Z configfile: pytest.ini
2025-12-04T10:11:57.8757580Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8757713Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8758335Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8758406Z Running 1 items in this shard
2025-12-04T10:11:57.8758413Z 
2025-12-04T10:11:57.8759143Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:02:48.651370774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8759147Z 
2025-12-04T10:11:57.8759442Z [W1204 10:02:57.673662749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8759449Z 
2025-12-04T10:11:57.8759739Z [W1204 10:02:57.673919823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8759777Z 
2025-12-04T10:11:57.8760119Z [W1204 10:02:57.679656281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8760123Z 
2025-12-04T10:11:57.8760416Z [W1204 10:02:57.680278892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8760420Z 
2025-12-04T10:11:57.8760707Z [W1204 10:02:57.680466835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8760712Z 
2025-12-04T10:11:57.8761000Z [W1204 10:02:57.685859637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8761003Z 
2025-12-04T10:11:57.8761289Z [W1204 10:02:57.686369796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8761296Z 
2025-12-04T10:11:57.8761582Z [W1204 10:02:57.686533359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8761585Z 
2025-12-04T10:11:57.8761665Z ('RERUN', {'yellow': True}) [10.9071s] [100%]
2025-12-04T10:11:57.8762392Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:02:58.894702226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8762441Z 
2025-12-04T10:11:57.8762728Z [W1204 10:02:58.895228435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8762732Z 
2025-12-04T10:11:57.8763030Z [W1204 10:02:58.895371568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8763068Z 
2025-12-04T10:11:57.8763358Z [W1204 10:02:58.898309378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8763362Z 
2025-12-04T10:11:57.8763647Z [W1204 10:02:58.898870827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8763650Z 
2025-12-04T10:11:57.8763942Z [W1204 10:02:58.899012990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8763945Z 
2025-12-04T10:11:57.8764234Z [W1204 10:02:58.903556167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8764237Z 
2025-12-04T10:11:57.8764529Z [W1204 10:02:58.904016405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8764535Z 
2025-12-04T10:11:57.8764819Z [W1204 10:02:58.904152868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8764856Z 
2025-12-04T10:11:57.8764939Z ('RERUN', {'yellow': True}) [0.4500s] [100%]
2025-12-04T10:11:57.8765662Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:02:59.341565586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8765666Z 
2025-12-04T10:11:57.8765957Z [W1204 10:02:59.342082994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8765965Z 
2025-12-04T10:11:57.8766256Z [W1204 10:02:59.342218857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8766293Z 
2025-12-04T10:11:57.8766582Z [W1204 10:02:59.345116526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8766586Z 
2025-12-04T10:11:57.8766874Z [W1204 10:02:59.345656735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8766877Z 
2025-12-04T10:11:57.8767165Z [W1204 10:02:59.345794088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8767168Z 
2025-12-04T10:11:57.8767460Z [W1204 10:02:59.350272464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8767464Z 
2025-12-04T10:11:57.8767756Z [W1204 10:02:59.350725642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8767762Z 
2025-12-04T10:11:57.8768052Z [W1204 10:02:59.350863504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8768057Z 
2025-12-04T10:11:57.8768118Z FAILED [0.4474s] [100%]
2025-12-04T10:11:57.8768122Z 
2025-12-04T10:11:57.8768206Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8768505Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8768578Z Traceback (most recent call last):
2025-12-04T10:11:57.8768924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8768991Z     method(*args, **kwargs)
2025-12-04T10:11:57.8769281Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8769383Z     method(*args, **kwargs)
2025-12-04T10:11:57.8769671Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8769731Z     with policy():
2025-12-04T10:11:57.8770028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8770093Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8770898Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8770902Z 
2025-12-04T10:11:57.8771027Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8771551Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8771557Z 
2025-12-04T10:11:57.8771748Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8771889Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8771985Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8772337Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8772465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8772524Z graph_break []
2025-12-04T10:11:57.8772647Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8773348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8773456Z   if out == self.unknown_value:
2025-12-04T10:11:57.8773754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8773827Z Traceback (most recent call last):
2025-12-04T10:11:57.8774122Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8774190Z     method(*args, **kwargs)
2025-12-04T10:11:57.8774482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8778581Z     method(*args, **kwargs)
2025-12-04T10:11:57.8778961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8779043Z     with policy():
2025-12-04T10:11:57.8779371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8779450Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8780294Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8780300Z 
2025-12-04T10:11:57.8780522Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8781066Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8781112Z 
2025-12-04T10:11:57.8781283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8781427Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8781529Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8781888Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8782024Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8782085Z graph_break []
2025-12-04T10:11:57.8782223Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8782937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8783016Z   if out == self.unknown_value:
2025-12-04T10:11:57.8783146Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8783282Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8783416Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8783775Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8783836Z graph_break []
2025-12-04T10:11:57.8783927Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8784231Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8784309Z Traceback (most recent call last):
2025-12-04T10:11:57.8784625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8784730Z     method(*args, **kwargs)
2025-12-04T10:11:57.8785032Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8785095Z     method(*args, **kwargs)
2025-12-04T10:11:57.8785386Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8785449Z     with policy():
2025-12-04T10:11:57.8785752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8785821Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8786654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8786662Z 
2025-12-04T10:11:57.8786805Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8787337Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8787341Z 
2025-12-04T10:11:57.8787502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8787670Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8787768Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8788118Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8788301Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8788359Z graph_break []
2025-12-04T10:11:57.8788486Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8789180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8789249Z   if out == self.unknown_value:
2025-12-04T10:11:57.8789374Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8789465Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8789588Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8789929Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8789988Z graph_break []
2025-12-04T10:11:57.8790112Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8790233Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8790355Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8790698Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8790756Z graph_break []
2025-12-04T10:11:57.8791249Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml -
2025-12-04T10:11:57.8791353Z =========================== short test summary info ============================
2025-12-04T10:11:57.8792665Z FAILED [0.4474s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8792706Z 
2025-12-04T10:11:57.8792834Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8793357Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8793363Z 
2025-12-04T10:11:57.8793519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8793636Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8793760Z ================== 1 failed, 57 deselected, 2 rerun in 11.83s ==================
2025-12-04T10:11:57.8793817Z Got exit code 1
2025-12-04T10:11:57.8793883Z Retrying single test...
2025-12-04T10:11:57.8794155Z W1204 10:03:05.926000 60544 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8794539Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml
2025-12-04T10:11:57.8794636Z ============================= test session starts ==============================
2025-12-04T10:11:57.8794880Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8794947Z cachedir: .pytest_cache
2025-12-04T10:11:57.8795261Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8795373Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8795436Z configfile: pytest.ini
2025-12-04T10:11:57.8795756Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8795886Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8796458Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8796529Z Running 1 items in this shard
2025-12-04T10:11:57.8796534Z 
2025-12-04T10:11:57.8797272Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:03:07.018935365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8797279Z 
2025-12-04T10:11:57.8797577Z [W1204 10:03:16.103175713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8797615Z 
2025-12-04T10:11:57.8797905Z [W1204 10:03:16.103415017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8797912Z 
2025-12-04T10:11:57.8798197Z [W1204 10:03:16.109327078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8798201Z 
2025-12-04T10:11:57.8798488Z [W1204 10:03:16.109928018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8798491Z 
2025-12-04T10:11:57.8798777Z [W1204 10:03:16.110130522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8798815Z 
2025-12-04T10:11:57.8799104Z [W1204 10:03:16.115688946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8799107Z 
2025-12-04T10:11:57.8799395Z [W1204 10:03:16.116220806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8799398Z 
2025-12-04T10:11:57.8799685Z [W1204 10:03:16.116392618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8799688Z 
2025-12-04T10:11:57.8799770Z ('RERUN', {'yellow': True}) [10.9829s] [100%]
2025-12-04T10:11:57.8800575Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:03:17.327093789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8800582Z 
2025-12-04T10:11:57.8800880Z [W1204 10:03:17.327627138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8800884Z 
2025-12-04T10:11:57.8801172Z [W1204 10:03:17.327770071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8801176Z 
2025-12-04T10:11:57.8801460Z [W1204 10:03:17.330773202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8801466Z 
2025-12-04T10:11:57.8801789Z [W1204 10:03:17.331343092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8801793Z 
2025-12-04T10:11:57.8802079Z [W1204 10:03:17.331481404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8802117Z 
2025-12-04T10:11:57.8802425Z [W1204 10:03:17.336092403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8802429Z 
2025-12-04T10:11:57.8802727Z [W1204 10:03:17.336570891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8802730Z 
2025-12-04T10:11:57.8803024Z [W1204 10:03:17.336711313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8803028Z 
2025-12-04T10:11:57.8803107Z ('RERUN', {'yellow': True}) [0.4521s] [100%]
2025-12-04T10:11:57.8803851Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:03:17.778970486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8803858Z 
2025-12-04T10:11:57.8804150Z [W1204 10:03:17.779491975 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8804153Z 
2025-12-04T10:11:57.8804475Z [W1204 10:03:17.779637507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8804482Z 
2025-12-04T10:11:57.8804772Z [W1204 10:03:17.782599498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8804776Z 
2025-12-04T10:11:57.8805067Z [W1204 10:03:17.783158828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8805070Z 
2025-12-04T10:11:57.8805362Z [W1204 10:03:17.783297440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8805411Z 
2025-12-04T10:11:57.8805706Z [W1204 10:03:17.787808107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8805709Z 
2025-12-04T10:11:57.8806000Z [W1204 10:03:17.788263525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8806003Z 
2025-12-04T10:11:57.8806290Z [W1204 10:03:17.788408387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8806293Z 
2025-12-04T10:11:57.8806356Z FAILED [0.4504s] [100%]
2025-12-04T10:11:57.8806360Z 
2025-12-04T10:11:57.8806445Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8806744Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8806833Z Traceback (most recent call last):
2025-12-04T10:11:57.8807155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8807221Z     method(*args, **kwargs)
2025-12-04T10:11:57.8807516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8807583Z     method(*args, **kwargs)
2025-12-04T10:11:57.8807878Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8807938Z     with policy():
2025-12-04T10:11:57.8808269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8808338Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8809148Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8809187Z 
2025-12-04T10:11:57.8809323Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8809851Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8809854Z 
2025-12-04T10:11:57.8810024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8810156Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8810250Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8810607Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8810738Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8810800Z graph_break []
2025-12-04T10:11:57.8810957Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8811656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8811728Z   if out == self.unknown_value:
2025-12-04T10:11:57.8812024Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8812097Z Traceback (most recent call last):
2025-12-04T10:11:57.8812397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8812460Z     method(*args, **kwargs)
2025-12-04T10:11:57.8812788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8812851Z     method(*args, **kwargs)
2025-12-04T10:11:57.8813139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8813200Z     with policy():
2025-12-04T10:11:57.8813492Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8813560Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8814383Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8814389Z 
2025-12-04T10:11:57.8814515Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8815046Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8815050Z 
2025-12-04T10:11:57.8815208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8815335Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8815436Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8815839Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8815970Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8816061Z graph_break []
2025-12-04T10:11:57.8816190Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8816884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8816952Z   if out == self.unknown_value:
2025-12-04T10:11:57.8817260Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8817353Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8817479Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8817827Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8817885Z graph_break []
2025-12-04T10:11:57.8817983Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8818281Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.8818424Z Traceback (most recent call last):
2025-12-04T10:11:57.8818733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8818795Z     method(*args, **kwargs)
2025-12-04T10:11:57.8819089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8819152Z     method(*args, **kwargs)
2025-12-04T10:11:57.8819444Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8819506Z     with policy():
2025-12-04T10:11:57.8819795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8819921Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8820746Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8820750Z 
2025-12-04T10:11:57.8820875Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8821411Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8821416Z 
2025-12-04T10:11:57.8821572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8821701Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8821792Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8822133Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8822261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8822318Z graph_break []
2025-12-04T10:11:57.8822443Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8823181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8823251Z   if out == self.unknown_value:
2025-12-04T10:11:57.8823377Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8823510Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8823630Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8824464Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8824523Z graph_break []
2025-12-04T10:11:57.8824647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8824732Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8824854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8825193Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8825249Z graph_break []
2025-12-04T10:11:57.8825745Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml -
2025-12-04T10:11:57.8825886Z =========================== short test summary info ============================
2025-12-04T10:11:57.8827185Z FAILED [0.4504s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8827194Z 
2025-12-04T10:11:57.8827316Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8827837Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8827875Z 
2025-12-04T10:11:57.8828036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8828139Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8828254Z ================== 1 failed, 57 deselected, 2 rerun in 11.91s ==================
2025-12-04T10:11:57.8828313Z Got exit code 1
2025-12-04T10:11:57.8828788Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.8829036Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8829298Z W1204 10:03:24.328000 60737 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8829688Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml
2025-12-04T10:11:57.8829786Z ============================= test session starts ==============================
2025-12-04T10:11:57.8829998Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8830066Z cachedir: .pytest_cache
2025-12-04T10:11:57.8830380Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8830458Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8830563Z configfile: pytest.ini
2025-12-04T10:11:57.8830880Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8831012Z collecting ... collected 58 items / 47 deselected / 11 selected
2025-12-04T10:11:57.8831132Z stepcurrent: skipping 47 already run items.
2025-12-04T10:11:57.8831202Z Running 11 items in this shard
2025-12-04T10:11:57.8831206Z 
2025-12-04T10:11:57.8831704Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8700s] [  9%]
2025-12-04T10:11:57.8832188Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4602s] [  9%]
2025-12-04T10:11:57.8832632Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4700s] [  9%]
2025-12-04T10:11:57.8832637Z 
2025-12-04T10:11:57.8832717Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8833018Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8833093Z Traceback (most recent call last):
2025-12-04T10:11:57.8833436Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8833515Z     method(*args, **kwargs)
2025-12-04T10:11:57.8833813Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8833876Z     method(*args, **kwargs)
2025-12-04T10:11:57.8834171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8834230Z     with policy():
2025-12-04T10:11:57.8834529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8834595Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8835425Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8835430Z 
2025-12-04T10:11:57.8835556Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8836076Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8836081Z 
2025-12-04T10:11:57.8836241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8836366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8836463Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8836817Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8836944Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8837003Z graph_break []
2025-12-04T10:11:57.8837292Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8837363Z Traceback (most recent call last):
2025-12-04T10:11:57.8837700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8837763Z     method(*args, **kwargs)
2025-12-04T10:11:57.8838060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8838122Z     method(*args, **kwargs)
2025-12-04T10:11:57.8838446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8838508Z     with policy():
2025-12-04T10:11:57.8838803Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8838867Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8839678Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8839682Z 
2025-12-04T10:11:57.8839802Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8840375Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8840382Z 
2025-12-04T10:11:57.8840539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8840705Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8840797Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8841140Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8841268Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8841326Z graph_break []
2025-12-04T10:11:57.8841450Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8841540Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8841661Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8842058Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8842117Z graph_break []
2025-12-04T10:11:57.8842198Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8842489Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8842560Z Traceback (most recent call last):
2025-12-04T10:11:57.8842861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8842941Z     method(*args, **kwargs)
2025-12-04T10:11:57.8843234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8843299Z     method(*args, **kwargs)
2025-12-04T10:11:57.8843591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8843649Z     with policy():
2025-12-04T10:11:57.8843944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8844013Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8844862Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8844867Z 
2025-12-04T10:11:57.8844989Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8845506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8845548Z 
2025-12-04T10:11:57.8845704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8845828Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8845918Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8846261Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8846382Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8846443Z graph_break []
2025-12-04T10:11:57.8846564Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8846655Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8846774Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8847114Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8847207Z graph_break []
2025-12-04T10:11:57.8847329Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8847417Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8847538Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8847877Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8847940Z graph_break []
2025-12-04T10:11:57.8848429Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml -
2025-12-04T10:11:57.8848563Z =========================== short test summary info ============================
2025-12-04T10:11:57.8849844Z FAILED [0.4700s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8849849Z 
2025-12-04T10:11:57.8849970Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8850486Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8850491Z 
2025-12-04T10:11:57.8850646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8850763Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8850881Z ================== 1 failed, 47 deselected, 2 rerun in 2.82s ===================
2025-12-04T10:11:57.8850943Z Got exit code 1
2025-12-04T10:11:57.8851016Z Retrying single test...
2025-12-04T10:11:57.8851280Z W1204 10:03:33.984000 60918 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8851786Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml
2025-12-04T10:11:57.8851890Z ============================= test session starts ==============================
2025-12-04T10:11:57.8852097Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8852199Z cachedir: .pytest_cache
2025-12-04T10:11:57.8852506Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8852582Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8852655Z configfile: pytest.ini
2025-12-04T10:11:57.8852978Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8853110Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8853683Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8853753Z Running 1 items in this shard
2025-12-04T10:11:57.8853757Z 
2025-12-04T10:11:57.8854487Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:35.036679337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8854494Z 
2025-12-04T10:11:57.8854831Z [W1204 10:03:44.962890384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8854835Z 
2025-12-04T10:11:57.8855128Z [W1204 10:03:44.963151158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8855132Z 
2025-12-04T10:11:57.8855422Z [W1204 10:03:44.969067388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8855425Z 
2025-12-04T10:11:57.8855714Z [W1204 10:03:44.969650748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8855753Z 
2025-12-04T10:11:57.8856042Z [W1204 10:03:44.969817831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8856045Z 
2025-12-04T10:11:57.8856335Z [W1204 10:03:44.975333125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8856339Z 
2025-12-04T10:11:57.8856625Z [W1204 10:03:44.975913185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8856629Z 
2025-12-04T10:11:57.8856918Z [W1204 10:03:44.976082158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8856921Z 
2025-12-04T10:11:57.8857004Z ('RERUN', {'yellow': True}) [10.7856s] [100%]
2025-12-04T10:11:57.8857725Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:45.146966674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8857731Z 
2025-12-04T10:11:57.8858025Z [W1204 10:03:45.147516554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8858029Z 
2025-12-04T10:11:57.8858313Z [W1204 10:03:45.147654936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8858316Z 
2025-12-04T10:11:57.8858646Z [W1204 10:03:45.150583406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8858650Z 
2025-12-04T10:11:57.8858939Z [W1204 10:03:45.151133305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8858975Z 
2025-12-04T10:11:57.8859269Z [W1204 10:03:45.151271268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8859272Z 
2025-12-04T10:11:57.8859562Z [W1204 10:03:45.155762335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8859565Z 
2025-12-04T10:11:57.8859855Z [W1204 10:03:45.156219002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8859858Z 
2025-12-04T10:11:57.8860144Z [W1204 10:03:45.156362925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8860147Z 
2025-12-04T10:11:57.8860225Z ('RERUN', {'yellow': True}) [0.4077s] [100%]
2025-12-04T10:11:57.8860947Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:45.551676282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8860954Z 
2025-12-04T10:11:57.8861285Z [W1204 10:03:45.552225262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8861290Z 
2025-12-04T10:11:57.8861584Z [W1204 10:03:45.552375454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8861588Z 
2025-12-04T10:11:57.8861879Z [W1204 10:03:45.555250194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8861883Z 
2025-12-04T10:11:57.8862173Z [W1204 10:03:45.555794883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8862177Z 
2025-12-04T10:11:57.8862464Z [W1204 10:03:45.555943665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8862501Z 
2025-12-04T10:11:57.8862795Z [W1204 10:03:45.560461392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8862799Z 
2025-12-04T10:11:57.8863086Z [W1204 10:03:45.560917490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8863089Z 
2025-12-04T10:11:57.8863381Z [W1204 10:03:45.561055153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8863385Z 
2025-12-04T10:11:57.8863446Z FAILED [0.4051s] [100%]
2025-12-04T10:11:57.8863449Z 
2025-12-04T10:11:57.8863535Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8863833Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8863909Z Traceback (most recent call last):
2025-12-04T10:11:57.8864228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8864293Z     method(*args, **kwargs)
2025-12-04T10:11:57.8864587Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8864649Z     method(*args, **kwargs)
2025-12-04T10:11:57.8864938Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8865036Z     with policy():
2025-12-04T10:11:57.8865335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8865400Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8866238Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8866244Z 
2025-12-04T10:11:57.8866374Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8866898Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8866902Z 
2025-12-04T10:11:57.8867069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8867204Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8867299Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8867649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8867838Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8867900Z graph_break []
2025-12-04T10:11:57.8868026Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8868725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8868797Z   if out == self.unknown_value:
2025-12-04T10:11:57.8869095Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8869166Z Traceback (most recent call last):
2025-12-04T10:11:57.8869472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8869574Z     method(*args, **kwargs)
2025-12-04T10:11:57.8869868Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8869933Z     method(*args, **kwargs)
2025-12-04T10:11:57.8870225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8870283Z     with policy():
2025-12-04T10:11:57.8870584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8870649Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8871460Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8871467Z 
2025-12-04T10:11:57.8871594Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8872115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8872121Z 
2025-12-04T10:11:57.8872277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8872441Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8872541Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8872891Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8873054Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8873115Z graph_break []
2025-12-04T10:11:57.8873238Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8873937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8874005Z   if out == self.unknown_value:
2025-12-04T10:11:57.8874126Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8874222Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8874343Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8874689Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8874750Z graph_break []
2025-12-04T10:11:57.8874833Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8875166Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8875239Z Traceback (most recent call last):
2025-12-04T10:11:57.8875544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8875614Z     method(*args, **kwargs)
2025-12-04T10:11:57.8875908Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8875973Z     method(*args, **kwargs)
2025-12-04T10:11:57.8876261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8876356Z     with policy():
2025-12-04T10:11:57.8876653Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8876717Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8877527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8877535Z 
2025-12-04T10:11:57.8877665Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8878183Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8878189Z 
2025-12-04T10:11:57.8878349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8878474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8878567Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8878909Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8879033Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8879095Z graph_break []
2025-12-04T10:11:57.8879217Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8879988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8880096Z   if out == self.unknown_value:
2025-12-04T10:11:57.8880218Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8880309Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8880435Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8880774Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8880835Z graph_break []
2025-12-04T10:11:57.8880955Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8881047Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8881167Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8881521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8881591Z graph_break []
2025-12-04T10:11:57.8882114Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml -
2025-12-04T10:11:57.8882218Z =========================== short test summary info ============================
2025-12-04T10:11:57.8883497Z FAILED [0.4051s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8883502Z 
2025-12-04T10:11:57.8883634Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8884186Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8884190Z 
2025-12-04T10:11:57.8884349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8884455Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8884570Z ================== 1 failed, 57 deselected, 2 rerun in 11.62s ==================
2025-12-04T10:11:57.8884633Z Got exit code 1
2025-12-04T10:11:57.8884699Z Retrying single test...
2025-12-04T10:11:57.8884961Z W1204 10:03:52.170000 61104 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8885352Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml
2025-12-04T10:11:57.8885449Z ============================= test session starts ==============================
2025-12-04T10:11:57.8885662Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8885730Z cachedir: .pytest_cache
2025-12-04T10:11:57.8886033Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8886112Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8886180Z configfile: pytest.ini
2025-12-04T10:11:57.8886532Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8886667Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8887235Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8887344Z Running 1 items in this shard
2025-12-04T10:11:57.8887347Z 
2025-12-04T10:11:57.8888075Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:03:53.220567579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8888079Z 
2025-12-04T10:11:57.8888377Z [W1204 10:04:02.151917132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8888385Z 
2025-12-04T10:11:57.8888674Z [W1204 10:04:02.152181576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8888678Z 
2025-12-04T10:11:57.8888966Z [W1204 10:04:02.158054136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8888972Z 
2025-12-04T10:11:57.8889297Z [W1204 10:04:02.158642337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8889301Z 
2025-12-04T10:11:57.8889589Z [W1204 10:04:02.158810269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8889592Z 
2025-12-04T10:11:57.8889879Z [W1204 10:04:02.164341604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8889883Z 
2025-12-04T10:11:57.8890170Z [W1204 10:04:02.164911524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8890173Z 
2025-12-04T10:11:57.8890464Z [W1204 10:04:02.165087417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8890502Z 
2025-12-04T10:11:57.8890584Z ('RERUN', {'yellow': True}) [10.7817s] [100%]
2025-12-04T10:11:57.8891312Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:04:03.327605421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8891318Z 
2025-12-04T10:11:57.8891605Z [W1204 10:04:03.328161791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8891608Z 
2025-12-04T10:11:57.8891895Z [W1204 10:04:03.328308424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8891901Z 
2025-12-04T10:11:57.8892186Z [W1204 10:04:03.331226553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8892192Z 
2025-12-04T10:11:57.8892482Z [W1204 10:04:03.331774182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8892486Z 
2025-12-04T10:11:57.8892775Z [W1204 10:04:03.331911605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8892779Z 
2025-12-04T10:11:57.8893067Z [W1204 10:04:03.336338740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8893071Z 
2025-12-04T10:11:57.8893393Z [W1204 10:04:03.336790128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8893397Z 
2025-12-04T10:11:57.8893684Z [W1204 10:04:03.336925990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8893738Z 
2025-12-04T10:11:57.8893821Z ('RERUN', {'yellow': True}) [0.4087s] [100%]
2025-12-04T10:11:57.8894554Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:04:03.734393409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8894558Z 
2025-12-04T10:11:57.8894848Z [W1204 10:04:03.734951398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8894855Z 
2025-12-04T10:11:57.8895142Z [W1204 10:04:03.735098841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8895146Z 
2025-12-04T10:11:57.8895434Z [W1204 10:04:03.737975630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8895440Z 
2025-12-04T10:11:57.8895728Z [W1204 10:04:03.738519659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8895768Z 
2025-12-04T10:11:57.8896056Z [W1204 10:04:03.738665691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8896059Z 
2025-12-04T10:11:57.8896350Z [W1204 10:04:03.743114147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8896354Z 
2025-12-04T10:11:57.8896641Z [W1204 10:04:03.743564805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8896644Z 
2025-12-04T10:11:57.8896930Z [W1204 10:04:03.743701268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8896968Z 
2025-12-04T10:11:57.8897029Z FAILED [0.4070s] [100%]
2025-12-04T10:11:57.8897032Z 
2025-12-04T10:11:57.8897115Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8897413Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8897485Z Traceback (most recent call last):
2025-12-04T10:11:57.8897796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8897860Z     method(*args, **kwargs)
2025-12-04T10:11:57.8898157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8898221Z     method(*args, **kwargs)
2025-12-04T10:11:57.8898509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8898573Z     with policy():
2025-12-04T10:11:57.8898867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8898935Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8899731Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8899735Z 
2025-12-04T10:11:57.8899896Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8900417Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8900453Z 
2025-12-04T10:11:57.8900613Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8900738Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8900836Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8901180Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8901309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8901367Z graph_break []
2025-12-04T10:11:57.8901492Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8902189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8902260Z   if out == self.unknown_value:
2025-12-04T10:11:57.8902552Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8902657Z Traceback (most recent call last):
2025-12-04T10:11:57.8902956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8903021Z     method(*args, **kwargs)
2025-12-04T10:11:57.8903313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8903375Z     method(*args, **kwargs)
2025-12-04T10:11:57.8903669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8903728Z     with policy():
2025-12-04T10:11:57.8904027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8904127Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8904942Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8904949Z 
2025-12-04T10:11:57.8905077Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8905596Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8905600Z 
2025-12-04T10:11:57.8905758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8905879Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8905974Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8906324Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8906451Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8906511Z graph_break []
2025-12-04T10:11:57.8906633Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8907357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8907429Z   if out == self.unknown_value:
2025-12-04T10:11:57.8907549Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8907673Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8907793Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8908134Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8908196Z graph_break []
2025-12-04T10:11:57.8908281Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8908568Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.8908641Z Traceback (most recent call last):
2025-12-04T10:11:57.8908936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8909007Z     method(*args, **kwargs)
2025-12-04T10:11:57.8909300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8909365Z     method(*args, **kwargs)
2025-12-04T10:11:57.8909693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8909752Z     with policy():
2025-12-04T10:11:57.8910051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8910115Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8910924Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8910927Z 
2025-12-04T10:11:57.8911054Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8911608Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8911613Z 
2025-12-04T10:11:57.8911771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8911892Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8911979Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8912332Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8912453Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8912513Z graph_break []
2025-12-04T10:11:57.8912632Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8913324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8913397Z   if out == self.unknown_value:
2025-12-04T10:11:57.8913518Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8913608Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8913732Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8914108Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8914170Z graph_break []
2025-12-04T10:11:57.8914292Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8914412Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8914536Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8914874Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8914932Z graph_break []
2025-12-04T10:11:57.8915417Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml -
2025-12-04T10:11:57.8915528Z =========================== short test summary info ============================
2025-12-04T10:11:57.8916810Z FAILED [0.4070s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8916817Z 
2025-12-04T10:11:57.8916979Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8917671Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8917675Z 
2025-12-04T10:11:57.8917832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8917942Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8918057Z ================== 1 failed, 57 deselected, 2 rerun in 11.62s ==================
2025-12-04T10:11:57.8918114Z Got exit code 1
2025-12-04T10:11:57.8918592Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.8918904Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.8919173Z W1204 10:04:10.305000 61290 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8919559Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml
2025-12-04T10:11:57.8919654Z ============================= test session starts ==============================
2025-12-04T10:11:57.8919940Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8920008Z cachedir: .pytest_cache
2025-12-04T10:11:57.8920319Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8920400Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8920464Z configfile: pytest.ini
2025-12-04T10:11:57.8920782Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8920909Z collecting ... collected 58 items / 48 deselected / 10 selected
2025-12-04T10:11:57.8920995Z stepcurrent: skipping 48 already run items.
2025-12-04T10:11:57.8921067Z Running 10 items in this shard
2025-12-04T10:11:57.8921071Z 
2025-12-04T10:11:57.8921642Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9285s] [ 10%]
2025-12-04T10:11:57.8922134Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5345s] [ 10%]
2025-12-04T10:11:57.8922622Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.5538s] [ 10%]
2025-12-04T10:11:57.8922627Z 
2025-12-04T10:11:57.8922712Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8923001Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8923073Z Traceback (most recent call last):
2025-12-04T10:11:57.8923387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8923463Z     method(*args, **kwargs)
2025-12-04T10:11:57.8923761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8923830Z     method(*args, **kwargs)
2025-12-04T10:11:57.8924124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8924184Z     with policy():
2025-12-04T10:11:57.8924532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8924599Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8925397Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8925401Z 
2025-12-04T10:11:57.8925536Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8926061Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8926100Z 
2025-12-04T10:11:57.8926258Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8926385Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8926484Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8927029Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8927160Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8927219Z graph_break []
2025-12-04T10:11:57.8927510Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8927587Z Traceback (most recent call last):
2025-12-04T10:11:57.8927883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8927949Z     method(*args, **kwargs)
2025-12-04T10:11:57.8928240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8928306Z     method(*args, **kwargs)
2025-12-04T10:11:57.8928608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8928665Z     with policy():
2025-12-04T10:11:57.8928992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8929060Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8929867Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8929907Z 
2025-12-04T10:11:57.8930037Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8930553Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8930557Z 
2025-12-04T10:11:57.8930718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8930842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8930933Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8931476Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8931635Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8931698Z graph_break []
2025-12-04T10:11:57.8931820Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8931908Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8932028Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8932567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8932623Z graph_break []
2025-12-04T10:11:57.8932708Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8933031Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8933107Z Traceback (most recent call last):
2025-12-04T10:11:57.8933407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8933473Z     method(*args, **kwargs)
2025-12-04T10:11:57.8933775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8933839Z     method(*args, **kwargs)
2025-12-04T10:11:57.8934135Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8934193Z     with policy():
2025-12-04T10:11:57.8934489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8934556Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8935367Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8935371Z 
2025-12-04T10:11:57.8935497Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8936045Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8936050Z 
2025-12-04T10:11:57.8936206Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8936332Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8936454Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8936997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8937118Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8937175Z graph_break []
2025-12-04T10:11:57.8937300Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8937385Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8937505Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8938048Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8938119Z graph_break []
2025-12-04T10:11:57.8938245Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8938331Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8938483Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8939020Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8939078Z graph_break []
2025-12-04T10:11:57.8939567Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml -
2025-12-04T10:11:57.8939665Z =========================== short test summary info ============================
2025-12-04T10:11:57.8940945Z FAILED [0.5538s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8940982Z 
2025-12-04T10:11:57.8941105Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8941623Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8941629Z 
2025-12-04T10:11:57.8941785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8941888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8942005Z ================== 1 failed, 48 deselected, 2 rerun in 3.04s ===================
2025-12-04T10:11:57.8942063Z Got exit code 1
2025-12-04T10:11:57.8942129Z Retrying single test...
2025-12-04T10:11:57.8942390Z W1204 10:04:19.970000 61472 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8942774Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml
2025-12-04T10:11:57.8942871Z ============================= test session starts ==============================
2025-12-04T10:11:57.8943110Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8943177Z cachedir: .pytest_cache
2025-12-04T10:11:57.8943484Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8943594Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8943659Z configfile: pytest.ini
2025-12-04T10:11:57.8943977Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8944105Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8944676Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8944746Z Running 1 items in this shard
2025-12-04T10:11:57.8944750Z 
2025-12-04T10:11:57.8945475Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:21.553416183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8945485Z 
2025-12-04T10:11:57.8945820Z [W1204 10:04:30.555430353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8945824Z 
2025-12-04T10:11:57.8946113Z [W1204 10:04:30.555683357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8946120Z 
2025-12-04T10:11:57.8946414Z [W1204 10:04:30.561745621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8946417Z 
2025-12-04T10:11:57.8946707Z [W1204 10:04:30.562347121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8946711Z 
2025-12-04T10:11:57.8947003Z [W1204 10:04:30.562527714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8947057Z 
2025-12-04T10:11:57.8947345Z [W1204 10:04:30.567996727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8947350Z 
2025-12-04T10:11:57.8947641Z [W1204 10:04:30.568540006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8947644Z 
2025-12-04T10:11:57.8947931Z [W1204 10:04:30.568700649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8947935Z 
2025-12-04T10:11:57.8948017Z ('RERUN', {'yellow': True}) [10.9380s] [100%]
2025-12-04T10:11:57.8948741Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:31.363507874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8948748Z 
2025-12-04T10:11:57.8949035Z [W1204 10:04:31.364054814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8949041Z 
2025-12-04T10:11:57.8949328Z [W1204 10:04:31.364193516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8949331Z 
2025-12-04T10:11:57.8949615Z [W1204 10:04:31.367100486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8949619Z 
2025-12-04T10:11:57.8949941Z [W1204 10:04:31.367542894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8949945Z 
2025-12-04T10:11:57.8950234Z [W1204 10:04:31.367680286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8950272Z 
2025-12-04T10:11:57.8950564Z [W1204 10:04:31.372203253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8950568Z 
2025-12-04T10:11:57.8950866Z [W1204 10:04:31.372668701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8950870Z 
2025-12-04T10:11:57.8951164Z [W1204 10:04:31.372803163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8951167Z 
2025-12-04T10:11:57.8951248Z ('RERUN', {'yellow': True}) [0.4916s] [100%]
2025-12-04T10:11:57.8951970Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:31.853352985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8951979Z 
2025-12-04T10:11:57.8952264Z [W1204 10:04:31.853890874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8952268Z 
2025-12-04T10:11:57.8952588Z [W1204 10:04:31.854031006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8952596Z 
2025-12-04T10:11:57.8952883Z [W1204 10:04:31.856915786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8952886Z 
2025-12-04T10:11:57.8953173Z [W1204 10:04:31.857361433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8953176Z 
2025-12-04T10:11:57.8953466Z [W1204 10:04:31.857499405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8953505Z 
2025-12-04T10:11:57.8953795Z [W1204 10:04:31.862038953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8953798Z 
2025-12-04T10:11:57.8954093Z [W1204 10:04:31.862495101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8954097Z 
2025-12-04T10:11:57.8954383Z [W1204 10:04:31.862631553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8954386Z 
2025-12-04T10:11:57.8954448Z FAILED [0.4925s] [100%]
2025-12-04T10:11:57.8954451Z 
2025-12-04T10:11:57.8954536Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8954828Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8954906Z Traceback (most recent call last):
2025-12-04T10:11:57.8955218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8955285Z     method(*args, **kwargs)
2025-12-04T10:11:57.8955589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8955653Z     method(*args, **kwargs)
2025-12-04T10:11:57.8955947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8956005Z     with policy():
2025-12-04T10:11:57.8956335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8956405Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8957199Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8957237Z 
2025-12-04T10:11:57.8957367Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8957885Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8957889Z 
2025-12-04T10:11:57.8958049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8958185Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8958281Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8958829Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8958959Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8959020Z graph_break []
2025-12-04T10:11:57.8959178Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8959918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8959992Z   if out == self.unknown_value:
2025-12-04T10:11:57.8960289Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8960363Z Traceback (most recent call last):
2025-12-04T10:11:57.8960663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8960841Z     method(*args, **kwargs)
2025-12-04T10:11:57.8961137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8961200Z     method(*args, **kwargs)
2025-12-04T10:11:57.8961490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8961549Z     with policy():
2025-12-04T10:11:57.8961843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8961907Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8962721Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8962728Z 
2025-12-04T10:11:57.8962852Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8963373Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8963376Z 
2025-12-04T10:11:57.8963536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8963670Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8963799Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8964343Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8964508Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8964569Z graph_break []
2025-12-04T10:11:57.8964697Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8965392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8965464Z   if out == self.unknown_value:
2025-12-04T10:11:57.8965590Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8965687Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8965815Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8966353Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8966414Z graph_break []
2025-12-04T10:11:57.8966546Z =================================== FAILURES ===================================
2025-12-04T10:11:57.8966843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8966917Z Traceback (most recent call last):
2025-12-04T10:11:57.8967221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8967287Z     method(*args, **kwargs)
2025-12-04T10:11:57.8967583Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8967645Z     method(*args, **kwargs)
2025-12-04T10:11:57.8967936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8968039Z     with policy():
2025-12-04T10:11:57.8968334Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8968402Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8969214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8969218Z 
2025-12-04T10:11:57.8969343Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8969866Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8969872Z 
2025-12-04T10:11:57.8970031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8970158Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8970247Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8970790Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8970919Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8971012Z graph_break []
2025-12-04T10:11:57.8971139Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8971831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8971933Z   if out == self.unknown_value:
2025-12-04T10:11:57.8972058Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8972150Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8972277Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8972815Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8972873Z graph_break []
2025-12-04T10:11:57.8972998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8973085Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8973206Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8973795Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8973855Z graph_break []
2025-12-04T10:11:57.8974355Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml -
2025-12-04T10:11:57.8974458Z =========================== short test summary info ============================
2025-12-04T10:11:57.8975747Z FAILED [0.4925s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.8975787Z 
2025-12-04T10:11:57.8975913Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8976431Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8976436Z 
2025-12-04T10:11:57.8976592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8976699Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.8976817Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.8976876Z Got exit code 1
2025-12-04T10:11:57.8976941Z Retrying single test...
2025-12-04T10:11:57.8977206Z W1204 10:04:38.407000 61658 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.8977591Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml
2025-12-04T10:11:57.8977685Z ============================= test session starts ==============================
2025-12-04T10:11:57.8977890Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.8977953Z cachedir: .pytest_cache
2025-12-04T10:11:57.8978299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.8978378Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.8978444Z configfile: pytest.ini
2025-12-04T10:11:57.8978759Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.8978922Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.8979495Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8979565Z Running 1 items in this shard
2025-12-04T10:11:57.8979568Z 
2025-12-04T10:11:57.8980299Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:40.996144010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8980303Z 
2025-12-04T10:11:57.8980600Z [W1204 10:04:49.261777275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8980606Z 
2025-12-04T10:11:57.8980898Z [W1204 10:04:49.262038479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8980906Z 
2025-12-04T10:11:57.8981228Z [W1204 10:04:49.268750784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8981231Z 
2025-12-04T10:11:57.8981519Z [W1204 10:04:49.269357304 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8981522Z 
2025-12-04T10:11:57.8981813Z [W1204 10:04:49.269539797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8981816Z 
2025-12-04T10:11:57.8982117Z [W1204 10:04:49.275117143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8982122Z 
2025-12-04T10:11:57.8982448Z [W1204 10:04:49.275655082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8982451Z 
2025-12-04T10:11:57.8982739Z [W1204 10:04:49.275816115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8982742Z 
2025-12-04T10:11:57.8982826Z ('RERUN', {'yellow': True}) [11.2059s] [100%]
2025-12-04T10:11:57.8983549Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:50.073150382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8983553Z 
2025-12-04T10:11:57.8983840Z [W1204 10:04:50.073712082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8983845Z 
2025-12-04T10:11:57.8984132Z [W1204 10:04:50.073862225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8984136Z 
2025-12-04T10:11:57.8984423Z [W1204 10:04:50.076833365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8984429Z 
2025-12-04T10:11:57.8984715Z [W1204 10:04:50.077283123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8984718Z 
2025-12-04T10:11:57.8985036Z [W1204 10:04:50.077421975 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8985040Z 
2025-12-04T10:11:57.8985328Z [W1204 10:04:50.081996884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8985364Z 
2025-12-04T10:11:57.8985649Z [W1204 10:04:50.082458701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8985654Z 
2025-12-04T10:11:57.8985944Z [W1204 10:04:50.082594814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8985948Z 
2025-12-04T10:11:57.8986025Z ('RERUN', {'yellow': True}) [0.5010s] [100%]
2025-12-04T10:11:57.8986745Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:04:50.573192013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8986748Z 
2025-12-04T10:11:57.8987032Z [W1204 10:04:50.573729472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8987037Z 
2025-12-04T10:11:57.8987326Z [W1204 10:04:50.573871635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8987331Z 
2025-12-04T10:11:57.8987650Z [W1204 10:04:50.576744784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8987654Z 
2025-12-04T10:11:57.8987940Z [W1204 10:04:50.577189791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8987943Z 
2025-12-04T10:11:57.8988241Z [W1204 10:04:50.577329814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8988244Z 
2025-12-04T10:11:57.8988534Z [W1204 10:04:50.581867111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8988537Z 
2025-12-04T10:11:57.8988828Z [W1204 10:04:50.582327729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8988864Z 
2025-12-04T10:11:57.8989154Z [W1204 10:04:50.582462952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.8989158Z 
2025-12-04T10:11:57.8989220Z FAILED [0.4981s] [100%]
2025-12-04T10:11:57.8989223Z 
2025-12-04T10:11:57.8989303Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.8989595Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8989672Z Traceback (most recent call last):
2025-12-04T10:11:57.8989977Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8990049Z     method(*args, **kwargs)
2025-12-04T10:11:57.8990351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8990413Z     method(*args, **kwargs)
2025-12-04T10:11:57.8990708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8990766Z     with policy():
2025-12-04T10:11:57.8991192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8991321Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8992190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.8992195Z 
2025-12-04T10:11:57.8992331Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8992907Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8992911Z 
2025-12-04T10:11:57.8993077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8993204Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8993298Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8993849Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8993976Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8994040Z graph_break []
2025-12-04T10:11:57.8994164Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.8994891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.8994969Z   if out == self.unknown_value:
2025-12-04T10:11:57.8995262Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.8995336Z Traceback (most recent call last):
2025-12-04T10:11:57.8995635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8995697Z     method(*args, **kwargs)
2025-12-04T10:11:57.8995990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.8996091Z     method(*args, **kwargs)
2025-12-04T10:11:57.8996401Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.8996464Z     with policy():
2025-12-04T10:11:57.8996769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.8996837Z     raise RuntimeError(msg)
2025-12-04T10:11:57.8997657Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.8997662Z 
2025-12-04T10:11:57.8997789Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.8998312Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.8998319Z 
2025-12-04T10:11:57.8998481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.8998618Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.8998716Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.8999528Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.8999662Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.8999721Z graph_break []
2025-12-04T10:11:57.8999849Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9000684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9000754Z   if out == self.unknown_value:
2025-12-04T10:11:57.9000880Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9000973Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9001102Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9001648Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9001708Z graph_break []
2025-12-04T10:11:57.9001793Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9002087Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9002164Z Traceback (most recent call last):
2025-12-04T10:11:57.9002504Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9002569Z     method(*args, **kwargs)
2025-12-04T10:11:57.9002863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9002924Z     method(*args, **kwargs)
2025-12-04T10:11:57.9003213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9003274Z     with policy():
2025-12-04T10:11:57.9003567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9003672Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9004485Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9004489Z 
2025-12-04T10:11:57.9004614Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9005136Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9005140Z 
2025-12-04T10:11:57.9005299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9005428Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9005521Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9006070Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9006195Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9006252Z graph_break []
2025-12-04T10:11:57.9006388Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9007114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9007188Z   if out == self.unknown_value:
2025-12-04T10:11:57.9007313Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9007446Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9007569Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9008115Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9008172Z graph_break []
2025-12-04T10:11:57.9008297Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9008384Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9008508Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9009042Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9009101Z graph_break []
2025-12-04T10:11:57.9009626Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml -
2025-12-04T10:11:57.9009727Z =========================== short test summary info ============================
2025-12-04T10:11:57.9011011Z FAILED [0.4981s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9011016Z 
2025-12-04T10:11:57.9011177Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9011697Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9011701Z 
2025-12-04T10:11:57.9011854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9011956Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9012075Z ================== 1 failed, 57 deselected, 2 rerun in 12.23s ==================
2025-12-04T10:11:57.9012134Z Got exit code 1
2025-12-04T10:11:57.9012609Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9012850Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9013116Z W1204 10:04:57.168000 61845 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9013509Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml
2025-12-04T10:11:57.9013602Z ============================= test session starts ==============================
2025-12-04T10:11:57.9013810Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9013876Z cachedir: .pytest_cache
2025-12-04T10:11:57.9014219Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9014298Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9014361Z configfile: pytest.ini
2025-12-04T10:11:57.9014675Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9014852Z collecting ... collected 58 items / 49 deselected / 9 selected
2025-12-04T10:11:57.9014938Z stepcurrent: skipping 49 already run items.
2025-12-04T10:11:57.9015011Z Running 9 items in this shard
2025-12-04T10:11:57.9015015Z 
2025-12-04T10:11:57.9015517Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.0089s] [ 11%]
2025-12-04T10:11:57.9016012Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6159s] [ 11%]
2025-12-04T10:11:57.9016463Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6300s] [ 11%]
2025-12-04T10:11:57.9016469Z 
2025-12-04T10:11:57.9016553Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9016891Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9016965Z Traceback (most recent call last):
2025-12-04T10:11:57.9017463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9017536Z     method(*args, **kwargs)
2025-12-04T10:11:57.9017835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9017901Z     method(*args, **kwargs)
2025-12-04T10:11:57.9018196Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9018255Z     with policy():
2025-12-04T10:11:57.9018559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9018695Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9019515Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9019519Z 
2025-12-04T10:11:57.9019650Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9020176Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9020182Z 
2025-12-04T10:11:57.9020338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9020468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9020565Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9020917Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9021045Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9021108Z graph_break []
2025-12-04T10:11:57.9021406Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9021532Z Traceback (most recent call last):
2025-12-04T10:11:57.9021836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9021898Z     method(*args, **kwargs)
2025-12-04T10:11:57.9022250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9022315Z     method(*args, **kwargs)
2025-12-04T10:11:57.9022609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9022669Z     with policy():
2025-12-04T10:11:57.9022966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9023038Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9023875Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9023879Z 
2025-12-04T10:11:57.9024009Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9024586Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9024590Z 
2025-12-04T10:11:57.9024750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9024876Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9024967Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9025318Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9025441Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9025499Z graph_break []
2025-12-04T10:11:57.9025623Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9025746Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9025866Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9026208Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9026267Z graph_break []
2025-12-04T10:11:57.9026352Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9026647Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9026720Z Traceback (most recent call last):
2025-12-04T10:11:57.9027019Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9027082Z     method(*args, **kwargs)
2025-12-04T10:11:57.9027374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9027443Z     method(*args, **kwargs)
2025-12-04T10:11:57.9027731Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9027790Z     with policy():
2025-12-04T10:11:57.9028093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9028159Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9029044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9029081Z 
2025-12-04T10:11:57.9029207Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9029734Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9029738Z 
2025-12-04T10:11:57.9029891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9030015Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9030107Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9030449Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9030574Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9030630Z graph_break []
2025-12-04T10:11:57.9030751Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9030840Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9030959Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9031340Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9031399Z graph_break []
2025-12-04T10:11:57.9031521Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9031609Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9031729Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9032068Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9032131Z graph_break []
2025-12-04T10:11:57.9032662Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml -
2025-12-04T10:11:57.9032767Z =========================== short test summary info ============================
2025-12-04T10:11:57.9034089Z FAILED [0.6300s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9034094Z 
2025-12-04T10:11:57.9034226Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9034758Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9034763Z 
2025-12-04T10:11:57.9034919Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9035026Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9035139Z ================== 1 failed, 49 deselected, 2 rerun in 3.28s ===================
2025-12-04T10:11:57.9035200Z Got exit code 1
2025-12-04T10:11:57.9035264Z Retrying single test...
2025-12-04T10:11:57.9035558Z W1204 10:05:07.003000 62034 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9035946Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml
2025-12-04T10:11:57.9036074Z ============================= test session starts ==============================
2025-12-04T10:11:57.9036287Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9036351Z cachedir: .pytest_cache
2025-12-04T10:11:57.9036655Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9036732Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9036797Z configfile: pytest.ini
2025-12-04T10:11:57.9037112Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9037241Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9037815Z stepcurrent: skipping 49 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9037893Z Running 1 items in this shard
2025-12-04T10:11:57.9037896Z 
2025-12-04T10:11:57.9038682Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:08.246299095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9038689Z 
2025-12-04T10:11:57.9038996Z [W1204 10:05:17.356698211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9039001Z 
2025-12-04T10:11:57.9039295Z [W1204 10:05:17.356955755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9039298Z 
2025-12-04T10:11:57.9039588Z [W1204 10:05:17.362761724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9039636Z 
2025-12-04T10:11:57.9039965Z [W1204 10:05:17.363352754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9039969Z 
2025-12-04T10:11:57.9040259Z [W1204 10:05:17.363510787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9040263Z 
2025-12-04T10:11:57.9040569Z [W1204 10:05:17.369115383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9040572Z 
2025-12-04T10:11:57.9040865Z [W1204 10:05:17.369648512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9040868Z 
2025-12-04T10:11:57.9041160Z [W1204 10:05:17.369841685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9041167Z 
2025-12-04T10:11:57.9041250Z ('RERUN', {'yellow': True}) [11.1581s] [100%]
2025-12-04T10:11:57.9041989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:18.723504635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9041992Z 
2025-12-04T10:11:57.9042281Z [W1204 10:05:18.724053044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9042284Z 
2025-12-04T10:11:57.9042616Z [W1204 10:05:18.724203717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9042620Z 
2025-12-04T10:11:57.9042909Z [W1204 10:05:18.727224248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9042946Z 
2025-12-04T10:11:57.9043236Z [W1204 10:05:18.727804698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9043243Z 
2025-12-04T10:11:57.9043531Z [W1204 10:05:18.727950201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9043535Z 
2025-12-04T10:11:57.9043820Z [W1204 10:05:18.732618070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9043823Z 
2025-12-04T10:11:57.9044115Z [W1204 10:05:18.733086668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9044118Z 
2025-12-04T10:11:57.9044403Z [W1204 10:05:18.733225721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9044408Z 
2025-12-04T10:11:57.9044498Z ('RERUN', {'yellow': True}) [0.5964s] [100%]
2025-12-04T10:11:57.9045262Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:19.315144211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9045267Z 
2025-12-04T10:11:57.9045559Z [W1204 10:05:19.315688540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9045562Z 
2025-12-04T10:11:57.9045850Z [W1204 10:05:19.315836872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9045854Z 
2025-12-04T10:11:57.9046140Z [W1204 10:05:19.318836144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9046148Z 
2025-12-04T10:11:57.9046470Z [W1204 10:05:19.319398803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9046473Z 
2025-12-04T10:11:57.9046763Z [W1204 10:05:19.319539856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9046767Z 
2025-12-04T10:11:57.9047054Z [W1204 10:05:19.324189915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9047057Z 
2025-12-04T10:11:57.9047345Z [W1204 10:05:19.324667033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9047348Z 
2025-12-04T10:11:57.9047639Z [W1204 10:05:19.324805616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9047643Z 
2025-12-04T10:11:57.9047703Z FAILED [0.5925s] [100%]
2025-12-04T10:11:57.9047708Z 
2025-12-04T10:11:57.9047795Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9048093Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9048175Z Traceback (most recent call last):
2025-12-04T10:11:57.9048489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9048555Z     method(*args, **kwargs)
2025-12-04T10:11:57.9048888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9048952Z     method(*args, **kwargs)
2025-12-04T10:11:57.9049240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9049336Z     with policy():
2025-12-04T10:11:57.9049633Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9049697Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9050510Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9050515Z 
2025-12-04T10:11:57.9050641Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9051172Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9051175Z 
2025-12-04T10:11:57.9051332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9051466Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9051560Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9051942Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9052075Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9052135Z graph_break []
2025-12-04T10:11:57.9052257Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9052954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9053025Z   if out == self.unknown_value:
2025-12-04T10:11:57.9053386Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9053458Z Traceback (most recent call last):
2025-12-04T10:11:57.9053756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9053821Z     method(*args, **kwargs)
2025-12-04T10:11:57.9054113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9054177Z     method(*args, **kwargs)
2025-12-04T10:11:57.9054463Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9054522Z     with policy():
2025-12-04T10:11:57.9054819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9054894Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9055731Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9055738Z 
2025-12-04T10:11:57.9055865Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9056425Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9056429Z 
2025-12-04T10:11:57.9056591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9056717Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9056850Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9057196Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9057324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9057385Z graph_break []
2025-12-04T10:11:57.9057506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9058198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9058266Z   if out == self.unknown_value:
2025-12-04T10:11:57.9058387Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9058481Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9058605Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9058981Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9059044Z graph_break []
2025-12-04T10:11:57.9059126Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9059429Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9059500Z Traceback (most recent call last):
2025-12-04T10:11:57.9059797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9059865Z     method(*args, **kwargs)
2025-12-04T10:11:57.9060156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9060257Z     method(*args, **kwargs)
2025-12-04T10:11:57.9060549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9060610Z     with policy():
2025-12-04T10:11:57.9060909Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9060973Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9061798Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9061805Z 
2025-12-04T10:11:57.9061930Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9062454Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9062461Z 
2025-12-04T10:11:57.9062623Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9062744Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9062847Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9063226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9063350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9063413Z graph_break []
2025-12-04T10:11:57.9063537Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9064260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9064335Z   if out == self.unknown_value:
2025-12-04T10:11:57.9064459Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9064554Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9064673Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9065010Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9065073Z graph_break []
2025-12-04T10:11:57.9065193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9065280Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9065404Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9065776Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9065838Z graph_break []
2025-12-04T10:11:57.9066324Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml -
2025-12-04T10:11:57.9066424Z =========================== short test summary info ============================
2025-12-04T10:11:57.9067740Z FAILED [0.5925s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9067779Z 
2025-12-04T10:11:57.9067908Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9068432Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9068436Z 
2025-12-04T10:11:57.9068591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9068699Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9068814Z ================== 1 failed, 57 deselected, 2 rerun in 12.37s ==================
2025-12-04T10:11:57.9068873Z Got exit code 1
2025-12-04T10:11:57.9068941Z Retrying single test...
2025-12-04T10:11:57.9069205Z W1204 10:05:25.913000 62228 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9069597Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml
2025-12-04T10:11:57.9069692Z ============================= test session starts ==============================
2025-12-04T10:11:57.9069898Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9069969Z cachedir: .pytest_cache
2025-12-04T10:11:57.9070313Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9070393Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9070457Z configfile: pytest.ini
2025-12-04T10:11:57.9070771Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9070938Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9071513Z stepcurrent: skipping 49 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9071582Z Running 1 items in this shard
2025-12-04T10:11:57.9071589Z 
2025-12-04T10:11:57.9072330Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:27.138526815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9072334Z 
2025-12-04T10:11:57.9072634Z [W1204 10:05:36.133227938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9072645Z 
2025-12-04T10:11:57.9072938Z [W1204 10:05:36.133511293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9072941Z 
2025-12-04T10:11:57.9073334Z [W1204 10:05:36.139256772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9073337Z 
2025-12-04T10:11:57.9073630Z [W1204 10:05:36.139842393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9073633Z 
2025-12-04T10:11:57.9073923Z [W1204 10:05:36.140054217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9073926Z 
2025-12-04T10:11:57.9074217Z [W1204 10:05:36.145568332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9074222Z 
2025-12-04T10:11:57.9074508Z [W1204 10:05:36.146093350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9074545Z 
2025-12-04T10:11:57.9074838Z [W1204 10:05:36.146283974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9074842Z 
2025-12-04T10:11:57.9074920Z ('RERUN', {'yellow': True}) [11.0213s] [100%]
2025-12-04T10:11:57.9075651Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:37.478784383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9075658Z 
2025-12-04T10:11:57.9075946Z [W1204 10:05:37.479315333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9075951Z 
2025-12-04T10:11:57.9076240Z [W1204 10:05:37.479458905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9076243Z 
2025-12-04T10:11:57.9076532Z [W1204 10:05:37.482415196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9076535Z 
2025-12-04T10:11:57.9076822Z [W1204 10:05:37.482978926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9076825Z 
2025-12-04T10:11:57.9077147Z [W1204 10:05:37.483116928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9077151Z 
2025-12-04T10:11:57.9077441Z [W1204 10:05:37.487528004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9077444Z 
2025-12-04T10:11:57.9077773Z [W1204 10:05:37.487986992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9077779Z 
2025-12-04T10:11:57.9078068Z [W1204 10:05:37.488132665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9078071Z 
2025-12-04T10:11:57.9078153Z ('RERUN', {'yellow': True}) [0.5755s] [100%]
2025-12-04T10:11:57.9078894Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:05:38.051796783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9078898Z 
2025-12-04T10:11:57.9079187Z [W1204 10:05:38.052337013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9079193Z 
2025-12-04T10:11:57.9079480Z [W1204 10:05:38.052485406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9079485Z 
2025-12-04T10:11:57.9079822Z [W1204 10:05:38.055420706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9079825Z 
2025-12-04T10:11:57.9080159Z [W1204 10:05:38.055980776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9080163Z 
2025-12-04T10:11:57.9080452Z [W1204 10:05:38.056119729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9080455Z 
2025-12-04T10:11:57.9080745Z [W1204 10:05:38.060674738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9080748Z 
2025-12-04T10:11:57.9081039Z [W1204 10:05:38.061138326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9081080Z 
2025-12-04T10:11:57.9081377Z [W1204 10:05:38.061274698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9081380Z 
2025-12-04T10:11:57.9081440Z FAILED [0.5728s] [100%]
2025-12-04T10:11:57.9081443Z 
2025-12-04T10:11:57.9081527Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9081830Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9081909Z Traceback (most recent call last):
2025-12-04T10:11:57.9082222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9082285Z     method(*args, **kwargs)
2025-12-04T10:11:57.9082593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9082664Z     method(*args, **kwargs)
2025-12-04T10:11:57.9082957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9083020Z     with policy():
2025-12-04T10:11:57.9083315Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9083380Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9084231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9084236Z 
2025-12-04T10:11:57.9084365Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9084929Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9084933Z 
2025-12-04T10:11:57.9085090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9085216Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9085311Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9085661Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9085791Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9085847Z graph_break []
2025-12-04T10:11:57.9085969Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9086702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9086784Z   if out == self.unknown_value:
2025-12-04T10:11:57.9087087Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9087161Z Traceback (most recent call last):
2025-12-04T10:11:57.9087460Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9087528Z     method(*args, **kwargs)
2025-12-04T10:11:57.9087820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9087883Z     method(*args, **kwargs)
2025-12-04T10:11:57.9088177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9088279Z     with policy():
2025-12-04T10:11:57.9088579Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9088643Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9089471Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9089475Z 
2025-12-04T10:11:57.9089602Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9090132Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9090139Z 
2025-12-04T10:11:57.9090301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9090429Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9090523Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9090869Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9090994Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9091090Z graph_break []
2025-12-04T10:11:57.9091213Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9091903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9092012Z   if out == self.unknown_value:
2025-12-04T10:11:57.9092135Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9092231Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9092353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9092694Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9092757Z graph_break []
2025-12-04T10:11:57.9092841Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9093136Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9093211Z Traceback (most recent call last):
2025-12-04T10:11:57.9093507Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9093573Z     method(*args, **kwargs)
2025-12-04T10:11:57.9093897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9093960Z     method(*args, **kwargs)
2025-12-04T10:11:57.9094251Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9094309Z     with policy():
2025-12-04T10:11:57.9094608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9094671Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9095492Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9095531Z 
2025-12-04T10:11:57.9095661Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9096183Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9096186Z 
2025-12-04T10:11:57.9096349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9096473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9096562Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9096906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9097030Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9097092Z graph_break []
2025-12-04T10:11:57.9097215Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9097901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9097974Z   if out == self.unknown_value:
2025-12-04T10:11:57.9098136Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9098237Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9098368Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9098746Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9098811Z graph_break []
2025-12-04T10:11:57.9098932Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9099017Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9099139Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9099475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9099536Z graph_break []
2025-12-04T10:11:57.9100023Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml -
2025-12-04T10:11:57.9100122Z =========================== short test summary info ============================
2025-12-04T10:11:57.9101473Z FAILED [0.5728s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9101478Z 
2025-12-04T10:11:57.9101601Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9102131Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9102135Z 
2025-12-04T10:11:57.9102292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9102433Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9102546Z ================== 1 failed, 57 deselected, 2 rerun in 12.19s ==================
2025-12-04T10:11:57.9102603Z Got exit code 1
2025-12-04T10:11:57.9103081Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9103324Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9103592Z W1204 10:05:44.629000 62422 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9103978Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml
2025-12-04T10:11:57.9104075Z ============================= test session starts ==============================
2025-12-04T10:11:57.9104284Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9104349Z cachedir: .pytest_cache
2025-12-04T10:11:57.9104662Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9104737Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9104802Z configfile: pytest.ini
2025-12-04T10:11:57.9105122Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9105291Z collecting ... collected 58 items / 50 deselected / 8 selected
2025-12-04T10:11:57.9105390Z stepcurrent: skipping 50 already run items.
2025-12-04T10:11:57.9105466Z Running 8 items in this shard
2025-12-04T10:11:57.9105469Z 
2025-12-04T10:11:57.9105969Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8400s] [ 12%]
2025-12-04T10:11:57.9106513Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4448s] [ 12%]
2025-12-04T10:11:57.9106956Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4354s] [ 12%]
2025-12-04T10:11:57.9106959Z 
2025-12-04T10:11:57.9107042Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9107336Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9107410Z Traceback (most recent call last):
2025-12-04T10:11:57.9107721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9107787Z     method(*args, **kwargs)
2025-12-04T10:11:57.9108115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9108184Z     method(*args, **kwargs)
2025-12-04T10:11:57.9108478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9108545Z     with policy():
2025-12-04T10:11:57.9108843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9108910Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9109718Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9109757Z 
2025-12-04T10:11:57.9109883Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9110408Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9110412Z 
2025-12-04T10:11:57.9110569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9110694Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9110791Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9111135Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9111275Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9111335Z graph_break []
2025-12-04T10:11:57.9111627Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9111702Z Traceback (most recent call last):
2025-12-04T10:11:57.9111999Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9112067Z     method(*args, **kwargs)
2025-12-04T10:11:57.9112361Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9112458Z     method(*args, **kwargs)
2025-12-04T10:11:57.9112752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9112810Z     with policy():
2025-12-04T10:11:57.9113104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9113210Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9114021Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9114025Z 
2025-12-04T10:11:57.9114153Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9114673Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9114677Z 
2025-12-04T10:11:57.9114834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9114961Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9115050Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9115436Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9115560Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9115617Z graph_break []
2025-12-04T10:11:57.9115744Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9115832Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9115955Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9116294Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9116388Z graph_break []
2025-12-04T10:11:57.9116475Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9116767Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9116841Z Traceback (most recent call last):
2025-12-04T10:11:57.9117282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9117360Z     method(*args, **kwargs)
2025-12-04T10:11:57.9117666Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9117735Z     method(*args, **kwargs)
2025-12-04T10:11:57.9118029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9118099Z     with policy():
2025-12-04T10:11:57.9118393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9118466Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9119282Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9119287Z 
2025-12-04T10:11:57.9119410Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9120033Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9120037Z 
2025-12-04T10:11:57.9120195Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9120374Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9120464Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9120806Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9120932Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9120989Z graph_break []
2025-12-04T10:11:57.9121114Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9121203Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9121324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9121671Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9121731Z graph_break []
2025-12-04T10:11:57.9121852Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9121943Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9122114Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9122458Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9122514Z graph_break []
2025-12-04T10:11:57.9123005Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml -
2025-12-04T10:11:57.9123110Z =========================== short test summary info ============================
2025-12-04T10:11:57.9124401Z FAILED [0.4354s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9124462Z 
2025-12-04T10:11:57.9124595Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9125115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9125118Z 
2025-12-04T10:11:57.9125277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9125386Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9125503Z ================== 1 failed, 50 deselected, 2 rerun in 2.74s ===================
2025-12-04T10:11:57.9125562Z Got exit code 1
2025-12-04T10:11:57.9125629Z Retrying single test...
2025-12-04T10:11:57.9125894Z W1204 10:05:54.360000 62610 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9126281Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml
2025-12-04T10:11:57.9126376Z ============================= test session starts ==============================
2025-12-04T10:11:57.9126621Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9126687Z cachedir: .pytest_cache
2025-12-04T10:11:57.9126991Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9127104Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9127169Z configfile: pytest.ini
2025-12-04T10:11:57.9127485Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9127612Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9128179Z stepcurrent: skipping 50 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9128254Z Running 1 items in this shard
2025-12-04T10:11:57.9128258Z 
2025-12-04T10:11:57.9128989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:05:55.414682188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9128996Z 
2025-12-04T10:11:57.9129300Z [W1204 10:06:04.554893326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9129303Z 
2025-12-04T10:11:57.9129629Z [W1204 10:06:04.555147050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9129634Z 
2025-12-04T10:11:57.9129935Z [W1204 10:06:04.561695764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9129939Z 
2025-12-04T10:11:57.9130229Z [W1204 10:06:04.562299774 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9130233Z 
2025-12-04T10:11:57.9130525Z [W1204 10:06:04.562471947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9130530Z 
2025-12-04T10:11:57.9130852Z [W1204 10:06:04.567911641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9130856Z 
2025-12-04T10:11:57.9131144Z [W1204 10:06:04.568444900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9131151Z 
2025-12-04T10:11:57.9131451Z [W1204 10:06:04.568637173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9131455Z 
2025-12-04T10:11:57.9131538Z ('RERUN', {'yellow': True}) [10.9993s] [100%]
2025-12-04T10:11:57.9132271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:05.737461229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9132276Z 
2025-12-04T10:11:57.9132568Z [W1204 10:06:05.737988828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9132572Z 
2025-12-04T10:11:57.9132865Z [W1204 10:06:05.738137200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9132869Z 
2025-12-04T10:11:57.9133155Z [W1204 10:06:05.741079661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9133158Z 
2025-12-04T10:11:57.9133501Z [W1204 10:06:05.741644051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9133505Z 
2025-12-04T10:11:57.9133794Z [W1204 10:06:05.741785664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9133797Z 
2025-12-04T10:11:57.9134127Z [W1204 10:06:05.746234751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9134135Z 
2025-12-04T10:11:57.9134430Z [W1204 10:06:05.746692559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9134433Z 
2025-12-04T10:11:57.9134721Z [W1204 10:06:05.746828031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9134728Z 
2025-12-04T10:11:57.9138984Z ('RERUN', {'yellow': True}) [0.4077s] [100%]
2025-12-04T10:11:57.9139794Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:06.145598445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9139802Z 
2025-12-04T10:11:57.9140118Z [W1204 10:06:06.146126184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9140128Z 
2025-12-04T10:11:57.9140507Z [W1204 10:06:06.146270016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9140512Z 
2025-12-04T10:11:57.9140804Z [W1204 10:06:06.149135036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9140807Z 
2025-12-04T10:11:57.9141111Z [W1204 10:06:06.149694935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9141115Z 
2025-12-04T10:11:57.9141403Z [W1204 10:06:06.149834148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9141407Z 
2025-12-04T10:11:57.9141695Z [W1204 10:06:06.154235064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9141737Z 
2025-12-04T10:11:57.9142024Z [W1204 10:06:06.154689152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9142027Z 
2025-12-04T10:11:57.9142311Z [W1204 10:06:06.154826154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9142314Z 
2025-12-04T10:11:57.9142378Z FAILED [0.4048s] [100%]
2025-12-04T10:11:57.9142382Z 
2025-12-04T10:11:57.9142474Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9142781Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9142858Z Traceback (most recent call last):
2025-12-04T10:11:57.9143180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9143250Z     method(*args, **kwargs)
2025-12-04T10:11:57.9143555Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9143622Z     method(*args, **kwargs)
2025-12-04T10:11:57.9143916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9143977Z     with policy():
2025-12-04T10:11:57.9144279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9144382Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9145196Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9145235Z 
2025-12-04T10:11:57.9145383Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9145918Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9145922Z 
2025-12-04T10:11:57.9146082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9146222Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9146326Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9146682Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9146816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9146877Z graph_break []
2025-12-04T10:11:57.9147009Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9147750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9147825Z   if out == self.unknown_value:
2025-12-04T10:11:57.9148135Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9148212Z Traceback (most recent call last):
2025-12-04T10:11:57.9148517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9148586Z     method(*args, **kwargs)
2025-12-04T10:11:57.9148876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9148976Z     method(*args, **kwargs)
2025-12-04T10:11:57.9149268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9149328Z     with policy():
2025-12-04T10:11:57.9149621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9149684Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9150505Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9150510Z 
2025-12-04T10:11:57.9150648Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9151173Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9151178Z 
2025-12-04T10:11:57.9151358Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9151488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9151585Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9151974Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9152103Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9152165Z graph_break []
2025-12-04T10:11:57.9152289Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9153026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9153102Z   if out == self.unknown_value:
2025-12-04T10:11:57.9153230Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9153327Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9153457Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9153810Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9153870Z graph_break []
2025-12-04T10:11:57.9153953Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9154252Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9154331Z Traceback (most recent call last):
2025-12-04T10:11:57.9154673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9154742Z     method(*args, **kwargs)
2025-12-04T10:11:57.9155035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9155098Z     method(*args, **kwargs)
2025-12-04T10:11:57.9155396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9155454Z     with policy():
2025-12-04T10:11:57.9155748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9155817Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9156680Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9156685Z 
2025-12-04T10:11:57.9156820Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9157344Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9157347Z 
2025-12-04T10:11:57.9157511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9157635Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9157727Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9158079Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9158205Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9158267Z graph_break []
2025-12-04T10:11:57.9158389Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9159118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9159193Z   if out == self.unknown_value:
2025-12-04T10:11:57.9159315Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9159439Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9159566Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9159966Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9160028Z graph_break []
2025-12-04T10:11:57.9160155Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9160244Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9160370Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9160711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9160771Z graph_break []
2025-12-04T10:11:57.9161267Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml -
2025-12-04T10:11:57.9161370Z =========================== short test summary info ============================
2025-12-04T10:11:57.9162706Z FAILED [0.4048s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9162712Z 
2025-12-04T10:11:57.9162840Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9163363Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9163428Z 
2025-12-04T10:11:57.9163588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9163696Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9163819Z ================== 1 failed, 57 deselected, 2 rerun in 11.84s ==================
2025-12-04T10:11:57.9163878Z Got exit code 1
2025-12-04T10:11:57.9163945Z Retrying single test...
2025-12-04T10:11:57.9164210Z W1204 10:06:12.736000 62803 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9164599Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml
2025-12-04T10:11:57.9164700Z ============================= test session starts ==============================
2025-12-04T10:11:57.9164914Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9164989Z cachedir: .pytest_cache
2025-12-04T10:11:57.9165304Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9165381Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9165453Z configfile: pytest.ini
2025-12-04T10:11:57.9165773Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9165908Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9166519Z stepcurrent: skipping 50 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9166593Z Running 1 items in this shard
2025-12-04T10:11:57.9166629Z 
2025-12-04T10:11:57.9167371Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:13.794462794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9167376Z 
2025-12-04T10:11:57.9167676Z [W1204 10:06:23.019741268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9167679Z 
2025-12-04T10:11:57.9167974Z [W1204 10:06:23.019977622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9167979Z 
2025-12-04T10:11:57.9168265Z [W1204 10:06:23.025720799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9168268Z 
2025-12-04T10:11:57.9168556Z [W1204 10:06:23.026256069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9168562Z 
2025-12-04T10:11:57.9168882Z [W1204 10:06:23.026421983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9168886Z 
2025-12-04T10:11:57.9169178Z [W1204 10:06:23.031703191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9169182Z 
2025-12-04T10:11:57.9169466Z [W1204 10:06:23.032209951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9169469Z 
2025-12-04T10:11:57.9169755Z [W1204 10:06:23.032408764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9169762Z 
2025-12-04T10:11:57.9169842Z ('RERUN', {'yellow': True}) [11.0903s] [100%]
2025-12-04T10:11:57.9170565Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:24.206161433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9170602Z 
2025-12-04T10:11:57.9170902Z [W1204 10:06:24.206684962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9170906Z 
2025-12-04T10:11:57.9171196Z [W1204 10:06:24.206825775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9171200Z 
2025-12-04T10:11:57.9171490Z [W1204 10:06:24.209765580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9171493Z 
2025-12-04T10:11:57.9171782Z [W1204 10:06:24.210333101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9171789Z 
2025-12-04T10:11:57.9172076Z [W1204 10:06:24.210475553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9172080Z 
2025-12-04T10:11:57.9172365Z [W1204 10:06:24.214955626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9172369Z 
2025-12-04T10:11:57.9172658Z [W1204 10:06:24.215403904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9172661Z 
2025-12-04T10:11:57.9172984Z [W1204 10:06:24.215541057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9172987Z 
2025-12-04T10:11:57.9173068Z ('RERUN', {'yellow': True}) [0.4188s] [100%]
2025-12-04T10:11:57.9173792Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:06:24.623284164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9173831Z 
2025-12-04T10:11:57.9174118Z [W1204 10:06:24.623800884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9174121Z 
2025-12-04T10:11:57.9174408Z [W1204 10:06:24.623944016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9174412Z 
2025-12-04T10:11:57.9174699Z [W1204 10:06:24.626813530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9174702Z 
2025-12-04T10:11:57.9174988Z [W1204 10:06:24.627353890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9174994Z 
2025-12-04T10:11:57.9175279Z [W1204 10:06:24.627495742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9175315Z 
2025-12-04T10:11:57.9175603Z [W1204 10:06:24.631968036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9175606Z 
2025-12-04T10:11:57.9175891Z [W1204 10:06:24.632428425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9175894Z 
2025-12-04T10:11:57.9176180Z [W1204 10:06:24.632570247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9176188Z 
2025-12-04T10:11:57.9176259Z FAILED [0.4142s] [100%]
2025-12-04T10:11:57.9176262Z 
2025-12-04T10:11:57.9176347Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9176685Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9176759Z Traceback (most recent call last):
2025-12-04T10:11:57.9177071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9177141Z     method(*args, **kwargs)
2025-12-04T10:11:57.9177433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9177497Z     method(*args, **kwargs)
2025-12-04T10:11:57.9177787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9177845Z     with policy():
2025-12-04T10:11:57.9178150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9178219Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9179024Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9179028Z 
2025-12-04T10:11:57.9179159Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9179717Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9179725Z 
2025-12-04T10:11:57.9179885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9180011Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9180148Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9180497Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9180624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9180695Z graph_break []
2025-12-04T10:11:57.9180822Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9181521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9181590Z   if out == self.unknown_value:
2025-12-04T10:11:57.9181878Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9181958Z Traceback (most recent call last):
2025-12-04T10:11:57.9182255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9182359Z     method(*args, **kwargs)
2025-12-04T10:11:57.9182654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9182716Z     method(*args, **kwargs)
2025-12-04T10:11:57.9183009Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9183067Z     with policy():
2025-12-04T10:11:57.9183360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9183430Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9184243Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9184284Z 
2025-12-04T10:11:57.9184417Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9184937Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9184941Z 
2025-12-04T10:11:57.9185101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9185224Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9185315Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9185669Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9185801Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9185858Z graph_break []
2025-12-04T10:11:57.9185984Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9186671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9186742Z   if out == self.unknown_value:
2025-12-04T10:11:57.9186898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9186989Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9187115Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9187567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9187630Z graph_break []
2025-12-04T10:11:57.9187724Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9188016Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9188093Z Traceback (most recent call last):
2025-12-04T10:11:57.9188388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9188453Z     method(*args, **kwargs)
2025-12-04T10:11:57.9188745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9188806Z     method(*args, **kwargs)
2025-12-04T10:11:57.9189098Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9189158Z     with policy():
2025-12-04T10:11:57.9189510Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9189583Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9190396Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9190401Z 
2025-12-04T10:11:57.9190529Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9191049Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9191088Z 
2025-12-04T10:11:57.9191247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9191371Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9191460Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9191800Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9191920Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9191978Z graph_break []
2025-12-04T10:11:57.9192115Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9192801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9192876Z   if out == self.unknown_value:
2025-12-04T10:11:57.9192997Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9193085Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9193211Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9193549Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9193611Z graph_break []
2025-12-04T10:11:57.9193765Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9193853Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9193981Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9194358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9194417Z graph_break []
2025-12-04T10:11:57.9194909Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml -
2025-12-04T10:11:57.9195007Z =========================== short test summary info ============================
2025-12-04T10:11:57.9196297Z FAILED [0.4142s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9196304Z 
2025-12-04T10:11:57.9196427Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9196983Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9196988Z 
2025-12-04T10:11:57.9197143Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9197246Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9197372Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.9197430Z Got exit code 1
2025-12-04T10:11:57.9197907Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9198183Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9198445Z W1204 10:06:31.240000 62996 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9198836Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml
2025-12-04T10:11:57.9198928Z ============================= test session starts ==============================
2025-12-04T10:11:57.9199137Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9199204Z cachedir: .pytest_cache
2025-12-04T10:11:57.9199509Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9199587Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9199653Z configfile: pytest.ini
2025-12-04T10:11:57.9200007Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9200152Z collecting ... collected 58 items / 51 deselected / 7 selected
2025-12-04T10:11:57.9200241Z stepcurrent: skipping 51 already run items.
2025-12-04T10:11:57.9200312Z Running 7 items in this shard
2025-12-04T10:11:57.9200316Z 
2025-12-04T10:11:57.9200815Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.9319s] [ 14%]
2025-12-04T10:11:57.9201334Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5388s] [ 14%]
2025-12-04T10:11:57.9201780Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5597s] [ 14%]
2025-12-04T10:11:57.9201818Z 
2025-12-04T10:11:57.9201903Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9202204Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9202280Z Traceback (most recent call last):
2025-12-04T10:11:57.9202589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9202653Z     method(*args, **kwargs)
2025-12-04T10:11:57.9202947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9203015Z     method(*args, **kwargs)
2025-12-04T10:11:57.9203304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9203367Z     with policy():
2025-12-04T10:11:57.9203670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9203734Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9204571Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9204576Z 
2025-12-04T10:11:57.9204701Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9205220Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9205228Z 
2025-12-04T10:11:57.9205385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9205546Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9205644Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9206192Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9206324Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9206381Z graph_break []
2025-12-04T10:11:57.9206669Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9206747Z Traceback (most recent call last):
2025-12-04T10:11:57.9207050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9207114Z     method(*args, **kwargs)
2025-12-04T10:11:57.9207407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9207470Z     method(*args, **kwargs)
2025-12-04T10:11:57.9207761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9207818Z     with policy():
2025-12-04T10:11:57.9208111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9208178Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9209022Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9209060Z 
2025-12-04T10:11:57.9209186Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9209706Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9209709Z 
2025-12-04T10:11:57.9209864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9209989Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9210082Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9210629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9210754Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9210810Z graph_break []
2025-12-04T10:11:57.9210938Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9211060Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9211180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9211719Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9211779Z graph_break []
2025-12-04T10:11:57.9211862Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9212146Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9212255Z Traceback (most recent call last):
2025-12-04T10:11:57.9212552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9212615Z     method(*args, **kwargs)
2025-12-04T10:11:57.9212917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9212980Z     method(*args, **kwargs)
2025-12-04T10:11:57.9213269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9213338Z     with policy():
2025-12-04T10:11:57.9213639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9213704Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9214519Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9214526Z 
2025-12-04T10:11:57.9214649Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9215167Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9215170Z 
2025-12-04T10:11:57.9215326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9215489Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9215579Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9216117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9216296Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9216355Z graph_break []
2025-12-04T10:11:57.9216478Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9216566Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9216684Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9217386Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9217447Z graph_break []
2025-12-04T10:11:57.9217566Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9217657Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9217775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9218382Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9218441Z graph_break []
2025-12-04T10:11:57.9218929Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml -
2025-12-04T10:11:57.9219033Z =========================== short test summary info ============================
2025-12-04T10:11:57.9220320Z FAILED [0.5597s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9220379Z 
2025-12-04T10:11:57.9220503Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9221018Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9221022Z 
2025-12-04T10:11:57.9221182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9221287Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9221409Z ================== 1 failed, 51 deselected, 2 rerun in 3.05s ===================
2025-12-04T10:11:57.9221474Z Got exit code 1
2025-12-04T10:11:57.9221542Z Retrying single test...
2025-12-04T10:11:57.9221807Z W1204 10:06:40.897000 63185 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9222191Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml
2025-12-04T10:11:57.9222285Z ============================= test session starts ==============================
2025-12-04T10:11:57.9222494Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9222609Z cachedir: .pytest_cache
2025-12-04T10:11:57.9222918Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9222993Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9223102Z configfile: pytest.ini
2025-12-04T10:11:57.9223422Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9223551Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9224125Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9224197Z Running 1 items in this shard
2025-12-04T10:11:57.9224200Z 
2025-12-04T10:11:57.9224936Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:06:42.482388401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9224940Z 
2025-12-04T10:11:57.9225239Z [W1204 10:06:51.540583224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9225245Z 
2025-12-04T10:11:57.9225572Z [W1204 10:06:51.540848839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9225576Z 
2025-12-04T10:11:57.9225864Z [W1204 10:06:51.546980614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9225867Z 
2025-12-04T10:11:57.9226154Z [W1204 10:06:51.547564244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9226158Z 
2025-12-04T10:11:57.9226442Z [W1204 10:06:51.547747038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9226446Z 
2025-12-04T10:11:57.9226739Z [W1204 10:06:51.553126270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9226782Z 
2025-12-04T10:11:57.9227073Z [W1204 10:06:51.553669109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9227077Z 
2025-12-04T10:11:57.9227363Z [W1204 10:06:51.553828662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9227367Z 
2025-12-04T10:11:57.9227451Z ('RERUN', {'yellow': True}) [11.0030s] [100%]
2025-12-04T10:11:57.9228184Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:06:52.362854420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9228188Z 
2025-12-04T10:11:57.9228478Z [W1204 10:06:52.363377209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9228484Z 
2025-12-04T10:11:57.9228770Z [W1204 10:06:52.363517842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9228774Z 
2025-12-04T10:11:57.9229065Z [W1204 10:06:52.366490043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9229068Z 
2025-12-04T10:11:57.9229351Z [W1204 10:06:52.366955681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9229355Z 
2025-12-04T10:11:57.9229676Z [W1204 10:06:52.367092183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9229683Z 
2025-12-04T10:11:57.9229969Z [W1204 10:06:52.371801745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9230006Z 
2025-12-04T10:11:57.9230292Z [W1204 10:06:52.372269282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9230296Z 
2025-12-04T10:11:57.9230585Z [W1204 10:06:52.372418445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9230588Z 
2025-12-04T10:11:57.9230666Z ('RERUN', {'yellow': True}) [0.5032s] [100%]
2025-12-04T10:11:57.9231393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:06:52.862613014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9231397Z 
2025-12-04T10:11:57.9231682Z [W1204 10:06:52.863139553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9231688Z 
2025-12-04T10:11:57.9231978Z [W1204 10:06:52.863278995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9232013Z 
2025-12-04T10:11:57.9232300Z [W1204 10:06:52.866259027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9232303Z 
2025-12-04T10:11:57.9232590Z [W1204 10:06:52.866716695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9232594Z 
2025-12-04T10:11:57.9232881Z [W1204 10:06:52.866855777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9232885Z 
2025-12-04T10:11:57.9233171Z [W1204 10:06:52.871576988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9233211Z 
2025-12-04T10:11:57.9233496Z [W1204 10:06:52.872053757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9233500Z 
2025-12-04T10:11:57.9233788Z [W1204 10:06:52.872191459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9233791Z 
2025-12-04T10:11:57.9233854Z FAILED [0.4999s] [100%]
2025-12-04T10:11:57.9233857Z 
2025-12-04T10:11:57.9233938Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9234231Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9234304Z Traceback (most recent call last):
2025-12-04T10:11:57.9234608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9234677Z     method(*args, **kwargs)
2025-12-04T10:11:57.9234968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9235031Z     method(*args, **kwargs)
2025-12-04T10:11:57.9235323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9235381Z     with policy():
2025-12-04T10:11:57.9235674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9235739Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9236576Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9236617Z 
2025-12-04T10:11:57.9236746Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9237268Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9237272Z 
2025-12-04T10:11:57.9237437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9237562Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9237658Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9238212Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9238340Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9238401Z graph_break []
2025-12-04T10:11:57.9238523Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9239248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9239319Z   if out == self.unknown_value:
2025-12-04T10:11:57.9239608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9239684Z Traceback (most recent call last):
2025-12-04T10:11:57.9240027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9240091Z     method(*args, **kwargs)
2025-12-04T10:11:57.9240387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9240487Z     method(*args, **kwargs)
2025-12-04T10:11:57.9240780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9240837Z     with policy():
2025-12-04T10:11:57.9241129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9241198Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9242011Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9242015Z 
2025-12-04T10:11:57.9242142Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9242664Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9242668Z 
2025-12-04T10:11:57.9242824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9242952Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9243044Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9243640Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9243766Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9243857Z graph_break []
2025-12-04T10:11:57.9243985Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9244677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9244749Z   if out == self.unknown_value:
2025-12-04T10:11:57.9244866Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9244954Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9245079Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9245617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9245680Z graph_break []
2025-12-04T10:11:57.9245761Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9246082Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9246159Z Traceback (most recent call last):
2025-12-04T10:11:57.9246461Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9246524Z     method(*args, **kwargs)
2025-12-04T10:11:57.9246819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9246879Z     method(*args, **kwargs)
2025-12-04T10:11:57.9247174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9247230Z     with policy():
2025-12-04T10:11:57.9247524Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9247626Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9248444Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9248448Z 
2025-12-04T10:11:57.9248573Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9249090Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9249094Z 
2025-12-04T10:11:57.9249250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9249375Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9249465Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9250005Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9250128Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9250197Z graph_break []
2025-12-04T10:11:57.9250356Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9251042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9251148Z   if out == self.unknown_value:
2025-12-04T10:11:57.9251270Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9251358Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9251484Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9252021Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9252082Z graph_break []
2025-12-04T10:11:57.9252203Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9252289Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9252409Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9252942Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9253005Z graph_break []
2025-12-04T10:11:57.9253522Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml -
2025-12-04T10:11:57.9253623Z =========================== short test summary info ============================
2025-12-04T10:11:57.9254923Z FAILED [0.4999s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9254963Z 
2025-12-04T10:11:57.9255088Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9255610Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9255614Z 
2025-12-04T10:11:57.9255771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9255877Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9255995Z ================== 1 failed, 57 deselected, 2 rerun in 12.03s ==================
2025-12-04T10:11:57.9256051Z Got exit code 1
2025-12-04T10:11:57.9256116Z Retrying single test...
2025-12-04T10:11:57.9256379Z W1204 10:06:59.467000 63379 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9256765Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml
2025-12-04T10:11:57.9256861Z ============================= test session starts ==============================
2025-12-04T10:11:57.9257063Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9257128Z cachedir: .pytest_cache
2025-12-04T10:11:57.9257428Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9257503Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9257605Z configfile: pytest.ini
2025-12-04T10:11:57.9257919Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9258043Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9258651Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9258722Z Running 1 items in this shard
2025-12-04T10:11:57.9258726Z 
2025-12-04T10:11:57.9259456Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:07:01.053437328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9259459Z 
2025-12-04T10:11:57.9259756Z [W1204 10:07:10.171037329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9259760Z 
2025-12-04T10:11:57.9260048Z [W1204 10:07:10.171293284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9260054Z 
2025-12-04T10:11:57.9260338Z [W1204 10:07:10.177385679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9260380Z 
2025-12-04T10:11:57.9260671Z [W1204 10:07:10.177993179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9260674Z 
2025-12-04T10:11:57.9260961Z [W1204 10:07:10.178175352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9260965Z 
2025-12-04T10:11:57.9261253Z [W1204 10:07:10.183636186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9261256Z 
2025-12-04T10:11:57.9261541Z [W1204 10:07:10.184182616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9261579Z 
2025-12-04T10:11:57.9261864Z [W1204 10:07:10.184362609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9261867Z 
2025-12-04T10:11:57.9261952Z ('RERUN', {'yellow': True}) [11.0608s] [100%]
2025-12-04T10:11:57.9262676Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:07:11.984501376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9262680Z 
2025-12-04T10:11:57.9262974Z [W1204 10:07:11.985012075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9262977Z 
2025-12-04T10:11:57.9263268Z [W1204 10:07:11.985157687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9263274Z 
2025-12-04T10:11:57.9263563Z [W1204 10:07:11.988012786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9263568Z 
2025-12-04T10:11:57.9263856Z [W1204 10:07:11.988468864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9263859Z 
2025-12-04T10:11:57.9264146Z [W1204 10:07:11.988607256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9264149Z 
2025-12-04T10:11:57.9264467Z [W1204 10:07:11.993064913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9264470Z 
2025-12-04T10:11:57.9264759Z [W1204 10:07:11.993516891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9264798Z 
2025-12-04T10:11:57.9265087Z [W1204 10:07:11.993652863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9265090Z 
2025-12-04T10:11:57.9265169Z ('RERUN', {'yellow': True}) [0.4940s] [100%]
2025-12-04T10:11:57.9265887Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:07:11.475107606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9265890Z 
2025-12-04T10:11:57.9266176Z [W1204 10:07:11.475619505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9266179Z 
2025-12-04T10:11:57.9266467Z [W1204 10:07:11.475764247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9266473Z 
2025-12-04T10:11:57.9266757Z [W1204 10:07:11.478611526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9266760Z 
2025-12-04T10:11:57.9267085Z [W1204 10:07:11.479061044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9267088Z 
2025-12-04T10:11:57.9267375Z [W1204 10:07:11.479201167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9267378Z 
2025-12-04T10:11:57.9267672Z [W1204 10:07:11.483654383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9267675Z 
2025-12-04T10:11:57.9267961Z [W1204 10:07:11.484113961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9268026Z 
2025-12-04T10:11:57.9268315Z [W1204 10:07:11.484251294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9268324Z 
2025-12-04T10:11:57.9268385Z FAILED [0.4922s] [100%]
2025-12-04T10:11:57.9268389Z 
2025-12-04T10:11:57.9268470Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9268764Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9268835Z Traceback (most recent call last):
2025-12-04T10:11:57.9269148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9269215Z     method(*args, **kwargs)
2025-12-04T10:11:57.9269508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9269574Z     method(*args, **kwargs)
2025-12-04T10:11:57.9269863Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9269922Z     with policy():
2025-12-04T10:11:57.9270220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9270283Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9271124Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9271131Z 
2025-12-04T10:11:57.9271262Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9271783Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9271821Z 
2025-12-04T10:11:57.9271982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9272115Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9272215Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9272762Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9272889Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9272948Z graph_break []
2025-12-04T10:11:57.9273072Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9273773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9273878Z   if out == self.unknown_value:
2025-12-04T10:11:57.9274169Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9274244Z Traceback (most recent call last):
2025-12-04T10:11:57.9274538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9274603Z     method(*args, **kwargs)
2025-12-04T10:11:57.9274892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9274952Z     method(*args, **kwargs)
2025-12-04T10:11:57.9275244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9275340Z     with policy():
2025-12-04T10:11:57.9275636Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9275702Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9276511Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9276517Z 
2025-12-04T10:11:57.9276642Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9277160Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9277167Z 
2025-12-04T10:11:57.9277326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9277449Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9277539Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9278082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9278240Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9278297Z graph_break []
2025-12-04T10:11:57.9278420Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9279106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9279211Z   if out == self.unknown_value:
2025-12-04T10:11:57.9279330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9279420Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9279543Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9280130Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9280191Z graph_break []
2025-12-04T10:11:57.9280272Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9280558Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9280639Z Traceback (most recent call last):
2025-12-04T10:11:57.9280972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9281037Z     method(*args, **kwargs)
2025-12-04T10:11:57.9281327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9281388Z     method(*args, **kwargs)
2025-12-04T10:11:57.9281679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9281737Z     with policy():
2025-12-04T10:11:57.9282030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9282096Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9282911Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9282951Z 
2025-12-04T10:11:57.9283076Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9283593Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9283597Z 
2025-12-04T10:11:57.9283753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9283874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9283976Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9284521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9284647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9284705Z graph_break []
2025-12-04T10:11:57.9284824Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9285542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9285614Z   if out == self.unknown_value:
2025-12-04T10:11:57.9285733Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9285821Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9285977Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9286520Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9286579Z graph_break []
2025-12-04T10:11:57.9286699Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9286785Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9286908Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9287440Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9287501Z graph_break []
2025-12-04T10:11:57.9287990Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml -
2025-12-04T10:11:57.9288124Z =========================== short test summary info ============================
2025-12-04T10:11:57.9289415Z FAILED [0.4922s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9289419Z 
2025-12-04T10:11:57.9289542Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9290062Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9290099Z 
2025-12-04T10:11:57.9290253Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9290357Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9290472Z ================== 1 failed, 57 deselected, 2 rerun in 12.07s ==================
2025-12-04T10:11:57.9290529Z Got exit code 1
2025-12-04T10:11:57.9291008Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9291250Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9291512Z W1204 10:07:18.111000 63572 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9291902Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml
2025-12-04T10:11:57.9291996Z ============================= test session starts ==============================
2025-12-04T10:11:57.9292204Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9292269Z cachedir: .pytest_cache
2025-12-04T10:11:57.9292573Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9292682Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9292749Z configfile: pytest.ini
2025-12-04T10:11:57.9293073Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9293237Z collecting ... collected 58 items / 52 deselected / 6 selected
2025-12-04T10:11:57.9293326Z stepcurrent: skipping 52 already run items.
2025-12-04T10:11:57.9293397Z Running 6 items in this shard
2025-12-04T10:11:57.9293401Z 
2025-12-04T10:11:57.9293902Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8971s] [ 16%]
2025-12-04T10:11:57.9294396Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4827s] [ 16%]
2025-12-04T10:11:57.9294839Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4744s] [ 16%]
2025-12-04T10:11:57.9294842Z 
2025-12-04T10:11:57.9294926Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9295219Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9295291Z Traceback (most recent call last):
2025-12-04T10:11:57.9295651Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9295716Z     method(*args, **kwargs)
2025-12-04T10:11:57.9296010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9296079Z     method(*args, **kwargs)
2025-12-04T10:11:57.9296366Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9296428Z     with policy():
2025-12-04T10:11:57.9296721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9296824Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9297642Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9297647Z 
2025-12-04T10:11:57.9297773Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9298306Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9298310Z 
2025-12-04T10:11:57.9298466Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9298592Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9298689Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9299040Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9299169Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9299225Z graph_break []
2025-12-04T10:11:57.9299726Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9299808Z Traceback (most recent call last):
2025-12-04T10:11:57.9300247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9300319Z     method(*args, **kwargs)
2025-12-04T10:11:57.9300612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9300712Z     method(*args, **kwargs)
2025-12-04T10:11:57.9301003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9301070Z     with policy():
2025-12-04T10:11:57.9301368Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9301436Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9302259Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9302263Z 
2025-12-04T10:11:57.9302398Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9302919Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9302925Z 
2025-12-04T10:11:57.9303121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9303252Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9303345Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9303691Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9303819Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9303876Z graph_break []
2025-12-04T10:11:57.9304001Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9304087Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9304247Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9304589Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9304647Z graph_break []
2025-12-04T10:11:57.9304743Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9305038Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9305115Z Traceback (most recent call last):
2025-12-04T10:11:57.9305415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9305478Z     method(*args, **kwargs)
2025-12-04T10:11:57.9305772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9305837Z     method(*args, **kwargs)
2025-12-04T10:11:57.9306125Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9306186Z     with policy():
2025-12-04T10:11:57.9306480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9306550Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9307402Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9307407Z 
2025-12-04T10:11:57.9307536Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9308059Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9308107Z 
2025-12-04T10:11:57.9308269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9308399Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9308488Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9308826Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9308954Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9309010Z graph_break []
2025-12-04T10:11:57.9309136Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9309222Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9309344Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9309718Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9309775Z graph_break []
2025-12-04T10:11:57.9309898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9309984Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9310103Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9310441Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9310497Z graph_break []
2025-12-04T10:11:57.9310983Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml -
2025-12-04T10:11:57.9311124Z =========================== short test summary info ============================
2025-12-04T10:11:57.9312418Z FAILED [0.4744s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9312424Z 
2025-12-04T10:11:57.9312550Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9313069Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9313076Z 
2025-12-04T10:11:57.9313234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9313346Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9313464Z ================== 1 failed, 52 deselected, 2 rerun in 2.88s ===================
2025-12-04T10:11:57.9313523Z Got exit code 1
2025-12-04T10:11:57.9313587Z Retrying single test...
2025-12-04T10:11:57.9313855Z W1204 10:07:27.817000 63760 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9314273Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml
2025-12-04T10:11:57.9314370Z ============================= test session starts ==============================
2025-12-04T10:11:57.9314578Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9314683Z cachedir: .pytest_cache
2025-12-04T10:11:57.9314994Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9315070Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9315136Z configfile: pytest.ini
2025-12-04T10:11:57.9315454Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9315585Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9316155Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9316229Z Running 1 items in this shard
2025-12-04T10:11:57.9316233Z 
2025-12-04T10:11:57.9317184Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:28.906628038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9317192Z 
2025-12-04T10:11:57.9317515Z [W1204 10:07:38.945596298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9317519Z 
2025-12-04T10:11:57.9317815Z [W1204 10:07:38.945849963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9317819Z 
2025-12-04T10:11:57.9318124Z [W1204 10:07:38.951664283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9318127Z 
2025-12-04T10:11:57.9318415Z [W1204 10:07:38.952206972 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9318484Z 
2025-12-04T10:11:57.9318780Z [W1204 10:07:38.952392835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9318784Z 
2025-12-04T10:11:57.9319071Z [W1204 10:07:38.957888760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9319075Z 
2025-12-04T10:11:57.9319359Z [W1204 10:07:38.958406729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9319365Z 
2025-12-04T10:11:57.9319655Z [W1204 10:07:38.958572142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9319658Z 
2025-12-04T10:11:57.9319740Z ('RERUN', {'yellow': True}) [10.9365s] [100%]
2025-12-04T10:11:57.9320571Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:39.173261433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9320578Z 
2025-12-04T10:11:57.9320870Z [W1204 10:07:39.173790392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9320873Z 
2025-12-04T10:11:57.9321162Z [W1204 10:07:39.173936125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9321166Z 
2025-12-04T10:11:57.9321506Z [W1204 10:07:39.176935076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9321510Z 
2025-12-04T10:11:57.9321804Z [W1204 10:07:39.177499366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9321872Z 
2025-12-04T10:11:57.9322160Z [W1204 10:07:39.177638219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9322165Z 
2025-12-04T10:11:57.9322453Z [W1204 10:07:39.182331929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9322456Z 
2025-12-04T10:11:57.9322741Z [W1204 10:07:39.182798837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9322744Z 
2025-12-04T10:11:57.9323029Z [W1204 10:07:39.182937510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9323035Z 
2025-12-04T10:11:57.9323115Z ('RERUN', {'yellow': True}) [0.4570s] [100%]
2025-12-04T10:11:57.9323843Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:39.627971816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9323898Z 
2025-12-04T10:11:57.9324191Z [W1204 10:07:39.628511676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9324194Z 
2025-12-04T10:11:57.9324478Z [W1204 10:07:39.628652668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9324481Z 
2025-12-04T10:11:57.9324772Z [W1204 10:07:39.631660340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9324775Z 
2025-12-04T10:11:57.9325059Z [W1204 10:07:39.632221800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9325098Z 
2025-12-04T10:11:57.9325386Z [W1204 10:07:39.632368882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9325389Z 
2025-12-04T10:11:57.9325676Z [W1204 10:07:39.637012922 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9325679Z 
2025-12-04T10:11:57.9325967Z [W1204 10:07:39.637481570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9325970Z 
2025-12-04T10:11:57.9326256Z [W1204 10:07:39.637618813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9326259Z 
2025-12-04T10:11:57.9326329Z FAILED [0.4524s] [100%]
2025-12-04T10:11:57.9326333Z 
2025-12-04T10:11:57.9326420Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9326722Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9326798Z Traceback (most recent call last):
2025-12-04T10:11:57.9327112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9327176Z     method(*args, **kwargs)
2025-12-04T10:11:57.9327475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9327539Z     method(*args, **kwargs)
2025-12-04T10:11:57.9327874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9327938Z     with policy():
2025-12-04T10:11:57.9328231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9328336Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9329145Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9329149Z 
2025-12-04T10:11:57.9329281Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9329807Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9329810Z 
2025-12-04T10:11:57.9329968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9330101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9330198Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9330587Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9330726Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9330787Z graph_break []
2025-12-04T10:11:57.9330917Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9331613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9331684Z   if out == self.unknown_value:
2025-12-04T10:11:57.9331978Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9332087Z Traceback (most recent call last):
2025-12-04T10:11:57.9332391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9332454Z     method(*args, **kwargs)
2025-12-04T10:11:57.9332748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9332813Z     method(*args, **kwargs)
2025-12-04T10:11:57.9333104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9333167Z     with policy():
2025-12-04T10:11:57.9333462Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9333527Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9334351Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9334358Z 
2025-12-04T10:11:57.9334485Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9335011Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9335014Z 
2025-12-04T10:11:57.9335204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9335329Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9335425Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9335768Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9335933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9335991Z graph_break []
2025-12-04T10:11:57.9336115Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9336811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9336881Z   if out == self.unknown_value:
2025-12-04T10:11:57.9337006Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9337096Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9337220Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9337567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9337627Z graph_break []
2025-12-04T10:11:57.9337743Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9338041Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9338120Z Traceback (most recent call last):
2025-12-04T10:11:57.9338422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9338486Z     method(*args, **kwargs)
2025-12-04T10:11:57.9338781Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9338850Z     method(*args, **kwargs)
2025-12-04T10:11:57.9339141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9339232Z     with policy():
2025-12-04T10:11:57.9339530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9339594Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9340419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9340424Z 
2025-12-04T10:11:57.9340547Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9341076Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9341082Z 
2025-12-04T10:11:57.9341237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9341359Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9341451Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9341791Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9341921Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9342019Z graph_break []
2025-12-04T10:11:57.9342144Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9342830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9342934Z   if out == self.unknown_value:
2025-12-04T10:11:57.9343055Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9343144Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9343261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9343603Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9343661Z graph_break []
2025-12-04T10:11:57.9343782Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9343875Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9343993Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9344336Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9344394Z graph_break []
2025-12-04T10:11:57.9344911Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml -
2025-12-04T10:11:57.9345015Z =========================== short test summary info ============================
2025-12-04T10:11:57.9346320Z FAILED [0.4524s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9346359Z 
2025-12-04T10:11:57.9346483Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9347004Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9347008Z 
2025-12-04T10:11:57.9347164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9347267Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9347380Z ================== 1 failed, 57 deselected, 2 rerun in 11.87s ==================
2025-12-04T10:11:57.9347443Z Got exit code 1
2025-12-04T10:11:57.9347506Z Retrying single test...
2025-12-04T10:11:57.9347770Z W1204 10:07:46.224000 63953 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9348163Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml
2025-12-04T10:11:57.9348263Z ============================= test session starts ==============================
2025-12-04T10:11:57.9348470Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9348534Z cachedir: .pytest_cache
2025-12-04T10:11:57.9348840Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9348921Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9348987Z configfile: pytest.ini
2025-12-04T10:11:57.9349370Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9349509Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9350123Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9350198Z Running 1 items in this shard
2025-12-04T10:11:57.9350202Z 
2025-12-04T10:11:57.9350937Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:47.319603078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9350941Z 
2025-12-04T10:11:57.9351244Z [W1204 10:07:56.477468724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9351248Z 
2025-12-04T10:11:57.9351539Z [W1204 10:07:56.477722819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9351546Z 
2025-12-04T10:11:57.9351834Z [W1204 10:07:56.483524839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9351837Z 
2025-12-04T10:11:57.9352156Z [W1204 10:07:56.484111629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9352160Z 
2025-12-04T10:11:57.9352450Z [W1204 10:07:56.484273472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9352457Z 
2025-12-04T10:11:57.9352747Z [W1204 10:07:56.489737056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9352750Z 
2025-12-04T10:11:57.9353032Z [W1204 10:07:56.490289755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9353070Z 
2025-12-04T10:11:57.9353359Z [W1204 10:07:56.490455908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9353362Z 
2025-12-04T10:11:57.9353445Z ('RERUN', {'yellow': True}) [11.0591s] [100%]
2025-12-04T10:11:57.9354177Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:57.703441072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9354181Z 
2025-12-04T10:11:57.9354469Z [W1204 10:07:57.703979682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9354472Z 
2025-12-04T10:11:57.9354758Z [W1204 10:07:57.704127564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9354763Z 
2025-12-04T10:11:57.9355051Z [W1204 10:07:57.707133417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9355054Z 
2025-12-04T10:11:57.9355345Z [W1204 10:07:57.707704536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9355348Z 
2025-12-04T10:11:57.9355637Z [W1204 10:07:57.707844749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9355640Z 
2025-12-04T10:11:57.9355967Z [W1204 10:07:57.712448249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9355975Z 
2025-12-04T10:11:57.9356264Z [W1204 10:07:57.712920207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9356300Z 
2025-12-04T10:11:57.9356590Z [W1204 10:07:57.713059949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9356594Z 
2025-12-04T10:11:57.9356678Z ('RERUN', {'yellow': True}) [0.4528s] [100%]
2025-12-04T10:11:57.9357403Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:07:58.153173195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9357406Z 
2025-12-04T10:11:57.9357692Z [W1204 10:07:58.153695885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9357696Z 
2025-12-04T10:11:57.9357981Z [W1204 10:07:58.153835957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9357986Z 
2025-12-04T10:11:57.9358287Z [W1204 10:07:58.156777539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9358290Z 
2025-12-04T10:11:57.9358612Z [W1204 10:07:58.157337568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9358615Z 
2025-12-04T10:11:57.9358900Z [W1204 10:07:58.157479601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9358906Z 
2025-12-04T10:11:57.9359193Z [W1204 10:07:58.162035471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9359196Z 
2025-12-04T10:11:57.9359483Z [W1204 10:07:58.162493299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9359487Z 
2025-12-04T10:11:57.9359776Z [W1204 10:07:58.162630621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9359816Z 
2025-12-04T10:11:57.9359934Z FAILED [0.4479s] [100%]
2025-12-04T10:11:57.9359938Z 
2025-12-04T10:11:57.9360028Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9360329Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9360405Z Traceback (most recent call last):
2025-12-04T10:11:57.9360721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9360787Z     method(*args, **kwargs)
2025-12-04T10:11:57.9361084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9361149Z     method(*args, **kwargs)
2025-12-04T10:11:57.9361441Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9361508Z     with policy():
2025-12-04T10:11:57.9361811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9361876Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9362732Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9362737Z 
2025-12-04T10:11:57.9362866Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9363394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9363432Z 
2025-12-04T10:11:57.9363595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9363725Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9363819Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9364169Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9364301Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9364360Z graph_break []
2025-12-04T10:11:57.9364488Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9365180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9365252Z   if out == self.unknown_value:
2025-12-04T10:11:57.9365584Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9365658Z Traceback (most recent call last):
2025-12-04T10:11:57.9365956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9366020Z     method(*args, **kwargs)
2025-12-04T10:11:57.9366316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9366380Z     method(*args, **kwargs)
2025-12-04T10:11:57.9366670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9366731Z     with policy():
2025-12-04T10:11:57.9367067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9367135Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9367973Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9367977Z 
2025-12-04T10:11:57.9368103Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9368628Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9368633Z 
2025-12-04T10:11:57.9368792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9368915Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9369012Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9369359Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9369484Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9369544Z graph_break []
2025-12-04T10:11:57.9369666Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9370620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9370725Z   if out == self.unknown_value:
2025-12-04T10:11:57.9370848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9370941Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9371064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9371408Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9371465Z graph_break []
2025-12-04T10:11:57.9371546Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9371840Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9371913Z Traceback (most recent call last):
2025-12-04T10:11:57.9372209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9372277Z     method(*args, **kwargs)
2025-12-04T10:11:57.9372569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9372672Z     method(*args, **kwargs)
2025-12-04T10:11:57.9372962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9373020Z     with policy():
2025-12-04T10:11:57.9373316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9373380Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9374204Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9374267Z 
2025-12-04T10:11:57.9374397Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9374920Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9374929Z 
2025-12-04T10:11:57.9375084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9375208Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9375302Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9375644Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9375767Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9375830Z graph_break []
2025-12-04T10:11:57.9375950Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9376639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9376707Z   if out == self.unknown_value:
2025-12-04T10:11:57.9376828Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9376922Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9377077Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9377419Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9377512Z graph_break []
2025-12-04T10:11:57.9377633Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9377725Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9377845Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9378181Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9378240Z graph_break []
2025-12-04T10:11:57.9378725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml -
2025-12-04T10:11:57.9378828Z =========================== short test summary info ============================
2025-12-04T10:11:57.9380169Z FAILED [0.4479s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9380178Z 
2025-12-04T10:11:57.9380312Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9380831Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9380835Z 
2025-12-04T10:11:57.9380989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9381094Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9381209Z ================== 1 failed, 57 deselected, 2 rerun in 11.98s ==================
2025-12-04T10:11:57.9381308Z Got exit code 1
2025-12-04T10:11:57.9381781Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9382025Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9382289Z W1204 10:08:04.740000 64146 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9382675Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml
2025-12-04T10:11:57.9382773Z ============================= test session starts ==============================
2025-12-04T10:11:57.9382979Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9383047Z cachedir: .pytest_cache
2025-12-04T10:11:57.9383353Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9383429Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9383495Z configfile: pytest.ini
2025-12-04T10:11:57.9383813Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9383942Z collecting ... collected 58 items / 53 deselected / 5 selected
2025-12-04T10:11:57.9384030Z stepcurrent: skipping 53 already run items.
2025-12-04T10:11:57.9384133Z Running 5 items in this shard
2025-12-04T10:11:57.9384137Z 
2025-12-04T10:11:57.9384633Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8357s] [ 20%]
2025-12-04T10:11:57.9385155Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4588s] [ 20%]
2025-12-04T10:11:57.9385602Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4621s] [ 20%]
2025-12-04T10:11:57.9385607Z 
2025-12-04T10:11:57.9385691Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9385984Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9386057Z Traceback (most recent call last):
2025-12-04T10:11:57.9386365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9386429Z     method(*args, **kwargs)
2025-12-04T10:11:57.9386726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9386789Z     method(*args, **kwargs)
2025-12-04T10:11:57.9387121Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9387185Z     with policy():
2025-12-04T10:11:57.9387485Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9387552Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9388355Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9388360Z 
2025-12-04T10:11:57.9388486Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9389041Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9389045Z 
2025-12-04T10:11:57.9389201Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9389330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9389424Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9389773Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9389900Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9389957Z graph_break []
2025-12-04T10:11:57.9390249Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9390323Z Traceback (most recent call last):
2025-12-04T10:11:57.9390625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9390688Z     method(*args, **kwargs)
2025-12-04T10:11:57.9390979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9391042Z     method(*args, **kwargs)
2025-12-04T10:11:57.9391336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9391429Z     with policy():
2025-12-04T10:11:57.9391730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9391793Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9392638Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9392645Z 
2025-12-04T10:11:57.9392767Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9393283Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9393287Z 
2025-12-04T10:11:57.9393446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9393569Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9393664Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9394007Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9394167Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9394228Z graph_break []
2025-12-04T10:11:57.9394348Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9394436Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9394560Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9394900Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9394964Z graph_break []
2025-12-04T10:11:57.9395054Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9395343Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9395455Z Traceback (most recent call last):
2025-12-04T10:11:57.9395754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9395816Z     method(*args, **kwargs)
2025-12-04T10:11:57.9396116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9396176Z     method(*args, **kwargs)
2025-12-04T10:11:57.9396470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9396533Z     with policy():
2025-12-04T10:11:57.9396824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9396892Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9397704Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9397710Z 
2025-12-04T10:11:57.9397834Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9398347Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9398351Z 
2025-12-04T10:11:57.9398540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9398661Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9398748Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9399125Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9399250Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9399305Z graph_break []
2025-12-04T10:11:57.9399430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9399520Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9399645Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9400046Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9400105Z graph_break []
2025-12-04T10:11:57.9400226Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9400313Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9400433Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9400829Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9400887Z graph_break []
2025-12-04T10:11:57.9401375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml -
2025-12-04T10:11:57.9401474Z =========================== short test summary info ============================
2025-12-04T10:11:57.9402769Z FAILED [0.4621s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9402813Z 
2025-12-04T10:11:57.9402940Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9403455Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9403462Z 
2025-12-04T10:11:57.9403617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9403721Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9403842Z ================== 1 failed, 53 deselected, 2 rerun in 2.78s ===================
2025-12-04T10:11:57.9403898Z Got exit code 1
2025-12-04T10:11:57.9403963Z Retrying single test...
2025-12-04T10:11:57.9404229Z W1204 10:08:14.379000 64327 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9404618Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml
2025-12-04T10:11:57.9404714Z ============================= test session starts ==============================
2025-12-04T10:11:57.9404920Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9404984Z cachedir: .pytest_cache
2025-12-04T10:11:57.9405331Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9405406Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9405469Z configfile: pytest.ini
2025-12-04T10:11:57.9405784Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9405948Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9406522Z stepcurrent: skipping 53 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9406591Z Running 1 items in this shard
2025-12-04T10:11:57.9406595Z 
2025-12-04T10:11:57.9407319Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:15.440951911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9407326Z 
2025-12-04T10:11:57.9407623Z [W1204 10:08:24.201539478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9407629Z 
2025-12-04T10:11:57.9407918Z [W1204 10:08:24.201789443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9407922Z 
2025-12-04T10:11:57.9408247Z [W1204 10:08:24.207577313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9408251Z 
2025-12-04T10:11:57.9408538Z [W1204 10:08:24.208195434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9408541Z 
2025-12-04T10:11:57.9408832Z [W1204 10:08:24.208389067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9408836Z 
2025-12-04T10:11:57.9409122Z [W1204 10:08:24.213915003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9409126Z 
2025-12-04T10:11:57.9409413Z [W1204 10:08:24.214451362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9409526Z 
2025-12-04T10:11:57.9409813Z [W1204 10:08:24.214610445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9409817Z 
2025-12-04T10:11:57.9409900Z ('RERUN', {'yellow': True}) [10.6268s] [100%]
2025-12-04T10:11:57.9410624Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:25.386902177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9410628Z 
2025-12-04T10:11:57.9410914Z [W1204 10:08:25.387462466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9410922Z 
2025-12-04T10:11:57.9411209Z [W1204 10:08:25.387607779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9411213Z 
2025-12-04T10:11:57.9411500Z [W1204 10:08:25.390597100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9411504Z 
2025-12-04T10:11:57.9411807Z [W1204 10:08:25.391184891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9411810Z 
2025-12-04T10:11:57.9412134Z [W1204 10:08:25.391325313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9412138Z 
2025-12-04T10:11:57.9412430Z [W1204 10:08:25.395850121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9412433Z 
2025-12-04T10:11:57.9412754Z [W1204 10:08:25.396315579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9412759Z 
2025-12-04T10:11:57.9413050Z [W1204 10:08:25.396453611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9413053Z 
2025-12-04T10:11:57.9413131Z ('RERUN', {'yellow': True}) [0.4157s] [100%]
2025-12-04T10:11:57.9413849Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:25.801566644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9413856Z 
2025-12-04T10:11:57.9414143Z [W1204 10:08:25.802139494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9414146Z 
2025-12-04T10:11:57.9414433Z [W1204 10:08:25.802282327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9414438Z 
2025-12-04T10:11:57.9414763Z [W1204 10:08:25.805195437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9414767Z 
2025-12-04T10:11:57.9415052Z [W1204 10:08:25.805766467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9415055Z 
2025-12-04T10:11:57.9415342Z [W1204 10:08:25.805906769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9415347Z 
2025-12-04T10:11:57.9415632Z [W1204 10:08:25.810369686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9415635Z 
2025-12-04T10:11:57.9415923Z [W1204 10:08:25.810829724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9415962Z 
2025-12-04T10:11:57.9416250Z [W1204 10:08:25.810965426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9416253Z 
2025-12-04T10:11:57.9416314Z FAILED [0.4102s] [100%]
2025-12-04T10:11:57.9416317Z 
2025-12-04T10:11:57.9416401Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9416696Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9416778Z Traceback (most recent call last):
2025-12-04T10:11:57.9417264Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9417334Z     method(*args, **kwargs)
2025-12-04T10:11:57.9417637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9417705Z     method(*args, **kwargs)
2025-12-04T10:11:57.9418002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9418061Z     with policy():
2025-12-04T10:11:57.9418356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9418425Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9419287Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9419292Z 
2025-12-04T10:11:57.9419428Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9419998Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9420004Z 
2025-12-04T10:11:57.9420167Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9420299Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9420396Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9420744Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9420872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9420939Z graph_break []
2025-12-04T10:11:57.9421073Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9421818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9421895Z   if out == self.unknown_value:
2025-12-04T10:11:57.9422188Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9422261Z Traceback (most recent call last):
2025-12-04T10:11:57.9422562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9422626Z     method(*args, **kwargs)
2025-12-04T10:11:57.9422923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9422987Z     method(*args, **kwargs)
2025-12-04T10:11:57.9423283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9423392Z     with policy():
2025-12-04T10:11:57.9423689Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9423754Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9424568Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9424573Z 
2025-12-04T10:11:57.9424702Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9425224Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9425231Z 
2025-12-04T10:11:57.9425386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9425513Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9425607Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9425950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9426081Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9426175Z graph_break []
2025-12-04T10:11:57.9426300Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9427002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9427124Z   if out == self.unknown_value:
2025-12-04T10:11:57.9427251Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9427342Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9427462Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9427810Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9427867Z graph_break []
2025-12-04T10:11:57.9427956Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9428245Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9428319Z Traceback (most recent call last):
2025-12-04T10:11:57.9428627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9428690Z     method(*args, **kwargs)
2025-12-04T10:11:57.9429016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9429082Z     method(*args, **kwargs)
2025-12-04T10:11:57.9429371Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9429434Z     with policy():
2025-12-04T10:11:57.9429730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9429794Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9430612Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9430652Z 
2025-12-04T10:11:57.9430776Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9431304Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9431308Z 
2025-12-04T10:11:57.9431467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9431597Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9431688Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9432024Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9432153Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9432210Z graph_break []
2025-12-04T10:11:57.9432332Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9433022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9433089Z   if out == self.unknown_value:
2025-12-04T10:11:57.9433252Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9433344Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9433463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9433805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9433900Z graph_break []
2025-12-04T10:11:57.9434024Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9434113Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9434231Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9434573Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9434629Z graph_break []
2025-12-04T10:11:57.9435116Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml -
2025-12-04T10:11:57.9435219Z =========================== short test summary info ============================
2025-12-04T10:11:57.9436542Z FAILED [0.4102s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9436552Z 
2025-12-04T10:11:57.9436678Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9437195Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9437199Z 
2025-12-04T10:11:57.9437355Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9437461Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9437609Z ================== 1 failed, 57 deselected, 2 rerun in 11.48s ==================
2025-12-04T10:11:57.9437671Z Got exit code 1
2025-12-04T10:11:57.9437735Z Retrying single test...
2025-12-04T10:11:57.9438000Z W1204 10:08:32.409000 64513 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9438388Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml
2025-12-04T10:11:57.9438482Z ============================= test session starts ==============================
2025-12-04T10:11:57.9438693Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9438759Z cachedir: .pytest_cache
2025-12-04T10:11:57.9439067Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9439144Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9439208Z configfile: pytest.ini
2025-12-04T10:11:57.9439526Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9439656Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9440276Z stepcurrent: skipping 53 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9440388Z Running 1 items in this shard
2025-12-04T10:11:57.9440391Z 
2025-12-04T10:11:57.9441120Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:33.449508012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9441158Z 
2025-12-04T10:11:57.9441464Z [W1204 10:08:42.526948746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9441468Z 
2025-12-04T10:11:57.9441756Z [W1204 10:08:42.527199120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9441760Z 
2025-12-04T10:11:57.9442052Z [W1204 10:08:42.533029290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9442055Z 
2025-12-04T10:11:57.9442346Z [W1204 10:08:42.533625471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9442349Z 
2025-12-04T10:11:57.9442637Z [W1204 10:08:42.533799394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9442643Z 
2025-12-04T10:11:57.9442932Z [W1204 10:08:42.539234608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9442969Z 
2025-12-04T10:11:57.9443260Z [W1204 10:08:42.539757906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9443264Z 
2025-12-04T10:11:57.9443562Z [W1204 10:08:42.539914159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9443566Z 
2025-12-04T10:11:57.9443648Z ('RERUN', {'yellow': True}) [10.9186s] [100%]
2025-12-04T10:11:57.9444371Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:43.704099851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9444414Z 
2025-12-04T10:11:57.9444702Z [W1204 10:08:43.704664431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9444706Z 
2025-12-04T10:11:57.9444996Z [W1204 10:08:43.704802854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9444999Z 
2025-12-04T10:11:57.9445283Z [W1204 10:08:43.707666513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9445286Z 
2025-12-04T10:11:57.9445583Z [W1204 10:08:43.708212622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9445586Z 
2025-12-04T10:11:57.9445872Z [W1204 10:08:43.708358605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9445878Z 
2025-12-04T10:11:57.9446168Z [W1204 10:08:43.712832662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9446172Z 
2025-12-04T10:11:57.9446458Z [W1204 10:08:43.713287830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9446461Z 
2025-12-04T10:11:57.9446745Z [W1204 10:08:43.713423143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9446751Z 
2025-12-04T10:11:57.9446834Z ('RERUN', {'yellow': True}) [0.4129s] [100%]
2025-12-04T10:11:57.9447587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:08:44.115626285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9447624Z 
2025-12-04T10:11:57.9447915Z [W1204 10:08:44.116180114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9447919Z 
2025-12-04T10:11:57.9448206Z [W1204 10:08:44.116332177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9448209Z 
2025-12-04T10:11:57.9448496Z [W1204 10:08:44.119211277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9448499Z 
2025-12-04T10:11:57.9448786Z [W1204 10:08:44.119765796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9448789Z 
2025-12-04T10:11:57.9449077Z [W1204 10:08:44.119903388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9449084Z 
2025-12-04T10:11:57.9449372Z [W1204 10:08:44.124383315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9449375Z 
2025-12-04T10:11:57.9449696Z [W1204 10:08:44.124833163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9449700Z 
2025-12-04T10:11:57.9449986Z [W1204 10:08:44.124967326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9449989Z 
2025-12-04T10:11:57.9450049Z FAILED [0.4051s] [100%]
2025-12-04T10:11:57.9450052Z 
2025-12-04T10:11:57.9450140Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9450429Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9450509Z Traceback (most recent call last):
2025-12-04T10:11:57.9450854Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9450921Z     method(*args, **kwargs)
2025-12-04T10:11:57.9451221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9451283Z     method(*args, **kwargs)
2025-12-04T10:11:57.9451572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9451633Z     with policy():
2025-12-04T10:11:57.9451933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9451999Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9452800Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9452807Z 
2025-12-04T10:11:57.9452936Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9453455Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9453458Z 
2025-12-04T10:11:57.9453615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9453796Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9453892Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9454236Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9454410Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9454471Z graph_break []
2025-12-04T10:11:57.9454599Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9455295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9455362Z   if out == self.unknown_value:
2025-12-04T10:11:57.9455656Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9455729Z Traceback (most recent call last):
2025-12-04T10:11:57.9456029Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9456095Z     method(*args, **kwargs)
2025-12-04T10:11:57.9456392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9456457Z     method(*args, **kwargs)
2025-12-04T10:11:57.9456783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9456847Z     with policy():
2025-12-04T10:11:57.9457141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9457205Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9458025Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9458064Z 
2025-12-04T10:11:57.9458191Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9458708Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9458712Z 
2025-12-04T10:11:57.9458866Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9458987Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9459086Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9459432Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9459558Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9459616Z graph_break []
2025-12-04T10:11:57.9459740Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9460433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9460500Z   if out == self.unknown_value:
2025-12-04T10:11:57.9460628Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9460720Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9460879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9461224Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9461282Z graph_break []
2025-12-04T10:11:57.9461407Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9461698Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9461770Z Traceback (most recent call last):
2025-12-04T10:11:57.9462075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9462136Z     method(*args, **kwargs)
2025-12-04T10:11:57.9462430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9462495Z     method(*args, **kwargs)
2025-12-04T10:11:57.9462783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9462843Z     with policy():
2025-12-04T10:11:57.9463143Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9463222Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9464079Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9464083Z 
2025-12-04T10:11:57.9464206Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9464725Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9464728Z 
2025-12-04T10:11:57.9464881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9465003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9465134Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9465478Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9465604Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9465660Z graph_break []
2025-12-04T10:11:57.9465781Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9466473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9466538Z   if out == self.unknown_value:
2025-12-04T10:11:57.9466668Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9466768Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9466887Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9467232Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9467290Z graph_break []
2025-12-04T10:11:57.9467412Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9467507Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9467628Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9468003Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9468063Z graph_break []
2025-12-04T10:11:57.9468585Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml -
2025-12-04T10:11:57.9468689Z =========================== short test summary info ============================
2025-12-04T10:11:57.9469974Z FAILED [0.4051s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9469978Z 
2025-12-04T10:11:57.9470106Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9470619Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9470626Z 
2025-12-04T10:11:57.9470816Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9470920Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9471037Z ================== 1 failed, 57 deselected, 2 rerun in 11.76s ==================
2025-12-04T10:11:57.9471099Z Got exit code 1
2025-12-04T10:11:57.9471572Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9471816Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9472079Z W1204 10:08:50.658000 64699 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9472499Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml
2025-12-04T10:11:57.9472599Z ============================= test session starts ==============================
2025-12-04T10:11:57.9472804Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9472868Z cachedir: .pytest_cache
2025-12-04T10:11:57.9473174Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9473248Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9473318Z configfile: pytest.ini
2025-12-04T10:11:57.9473632Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9473758Z collecting ... collected 58 items / 54 deselected / 4 selected
2025-12-04T10:11:57.9473852Z stepcurrent: skipping 54 already run items.
2025-12-04T10:11:57.9473921Z Running 4 items in this shard
2025-12-04T10:11:57.9473924Z 
2025-12-04T10:11:57.9474429Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8644s] [ 25%]
2025-12-04T10:11:57.9474914Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4495s] [ 25%]
2025-12-04T10:11:57.9475393Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4478s] [ 25%]
2025-12-04T10:11:57.9475397Z 
2025-12-04T10:11:57.9475481Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9475805Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9475884Z Traceback (most recent call last):
2025-12-04T10:11:57.9476190Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9476252Z     method(*args, **kwargs)
2025-12-04T10:11:57.9476552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9476613Z     method(*args, **kwargs)
2025-12-04T10:11:57.9476910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9476968Z     with policy():
2025-12-04T10:11:57.9477265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9477343Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9478183Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9478187Z 
2025-12-04T10:11:57.9478314Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9478828Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9478834Z 
2025-12-04T10:11:57.9478989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9479117Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9479207Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9479608Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9479735Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9479793Z graph_break []
2025-12-04T10:11:57.9480136Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9480210Z Traceback (most recent call last):
2025-12-04T10:11:57.9480511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9480579Z     method(*args, **kwargs)
2025-12-04T10:11:57.9480872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9480939Z     method(*args, **kwargs)
2025-12-04T10:11:57.9481232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9481290Z     with policy():
2025-12-04T10:11:57.9481593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9481657Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9482512Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9482517Z 
2025-12-04T10:11:57.9482643Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9483155Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9483201Z 
2025-12-04T10:11:57.9483355Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9483480Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9483573Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9483918Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9484040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9484105Z graph_break []
2025-12-04T10:11:57.9484226Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9484318Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9484442Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9484786Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9484848Z graph_break []
2025-12-04T10:11:57.9484969Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9485257Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9485333Z Traceback (most recent call last):
2025-12-04T10:11:57.9485627Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9485695Z     method(*args, **kwargs)
2025-12-04T10:11:57.9485989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9486049Z     method(*args, **kwargs)
2025-12-04T10:11:57.9486345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9486438Z     with policy():
2025-12-04T10:11:57.9486734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9486804Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9487611Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9487615Z 
2025-12-04T10:11:57.9487740Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9488250Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9488256Z 
2025-12-04T10:11:57.9488415Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9488541Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9488630Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9488976Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9489098Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9489196Z graph_break []
2025-12-04T10:11:57.9489319Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9489407Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9489541Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9489918Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9489975Z graph_break []
2025-12-04T10:11:57.9490100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9490185Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9490305Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9490643Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9490698Z graph_break []
2025-12-04T10:11:57.9491184Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml -
2025-12-04T10:11:57.9491287Z =========================== short test summary info ============================
2025-12-04T10:11:57.9492602Z FAILED [0.4478s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9492606Z 
2025-12-04T10:11:57.9492729Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9493251Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9493256Z 
2025-12-04T10:11:57.9493449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9493551Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9493670Z ================== 1 failed, 54 deselected, 2 rerun in 2.79s ===================
2025-12-04T10:11:57.9493801Z Got exit code 1
2025-12-04T10:11:57.9493900Z Retrying single test...
2025-12-04T10:11:57.9494249Z W1204 10:09:00.275000 64880 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9494666Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml
2025-12-04T10:11:57.9494793Z ============================= test session starts ==============================
2025-12-04T10:11:57.9495267Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9499605Z cachedir: .pytest_cache
2025-12-04T10:11:57.9499978Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9500068Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9500141Z configfile: pytest.ini
2025-12-04T10:11:57.9500478Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9500624Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9501279Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9501361Z Running 1 items in this shard
2025-12-04T10:11:57.9501366Z 
2025-12-04T10:11:57.9502104Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:01.577120863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9502148Z 
2025-12-04T10:11:57.9502461Z [W1204 10:09:10.773823973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9502465Z 
2025-12-04T10:11:57.9502755Z [W1204 10:09:10.774071868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9502759Z 
2025-12-04T10:11:57.9503052Z [W1204 10:09:10.779838076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9503055Z 
2025-12-04T10:11:57.9503341Z [W1204 10:09:10.780500227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9503345Z 
2025-12-04T10:11:57.9503638Z [W1204 10:09:10.780677250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9503642Z 
2025-12-04T10:11:57.9503963Z [W1204 10:09:10.786075073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9503967Z 
2025-12-04T10:11:57.9504254Z [W1204 10:09:10.786595031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9504258Z 
2025-12-04T10:11:57.9504549Z [W1204 10:09:10.786749924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9504552Z 
2025-12-04T10:11:57.9504637Z ('RERUN', {'yellow': True}) [11.1185s] [100%]
2025-12-04T10:11:57.9505363Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:11.772564090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9505401Z 
2025-12-04T10:11:57.9505691Z [W1204 10:09:11.773123570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9505694Z 
2025-12-04T10:11:57.9505984Z [W1204 10:09:11.773268752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9505988Z 
2025-12-04T10:11:57.9506278Z [W1204 10:09:11.776120391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9506281Z 
2025-12-04T10:11:57.9506570Z [W1204 10:09:11.776682350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9506575Z 
2025-12-04T10:11:57.9506860Z [W1204 10:09:11.776821333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9506865Z 
2025-12-04T10:11:57.9507149Z [W1204 10:09:11.781256298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9507156Z 
2025-12-04T10:11:57.9507440Z [W1204 10:09:11.781718706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9507443Z 
2025-12-04T10:11:57.9507761Z [W1204 10:09:11.781853889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9507764Z 
2025-12-04T10:11:57.9507845Z ('RERUN', {'yellow': True}) [0.4083s] [100%]
2025-12-04T10:11:57.9508564Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:12.178400151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9508602Z 
2025-12-04T10:11:57.9508893Z [W1204 10:09:12.178970180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9508896Z 
2025-12-04T10:11:57.9509182Z [W1204 10:09:12.179117183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9509185Z 
2025-12-04T10:11:57.9509474Z [W1204 10:09:12.181988422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9509477Z 
2025-12-04T10:11:57.9509760Z [W1204 10:09:12.182541141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9509763Z 
2025-12-04T10:11:57.9510053Z [W1204 10:09:12.182683484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9510057Z 
2025-12-04T10:11:57.9510381Z [W1204 10:09:12.187068759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9510384Z 
2025-12-04T10:11:57.9510673Z [W1204 10:09:12.187517236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9510676Z 
2025-12-04T10:11:57.9510961Z [W1204 10:09:12.187654959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9510967Z 
2025-12-04T10:11:57.9511030Z FAILED [0.4037s] [100%]
2025-12-04T10:11:57.9511034Z 
2025-12-04T10:11:57.9511125Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9511422Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9511546Z Traceback (most recent call last):
2025-12-04T10:11:57.9511867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9511932Z     method(*args, **kwargs)
2025-12-04T10:11:57.9512229Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9512290Z     method(*args, **kwargs)
2025-12-04T10:11:57.9512577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9512642Z     with policy():
2025-12-04T10:11:57.9512947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9513016Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9513813Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9513819Z 
2025-12-04T10:11:57.9513951Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9514478Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9514481Z 
2025-12-04T10:11:57.9514682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9514821Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9514917Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9515298Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9515433Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9515495Z graph_break []
2025-12-04T10:11:57.9515625Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9516338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9516413Z   if out == self.unknown_value:
2025-12-04T10:11:57.9516717Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9516796Z Traceback (most recent call last):
2025-12-04T10:11:57.9517313Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9517388Z     method(*args, **kwargs)
2025-12-04T10:11:57.9517759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9517828Z     method(*args, **kwargs)
2025-12-04T10:11:57.9518117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9518176Z     with policy():
2025-12-04T10:11:57.9518473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9518539Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9519351Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9519411Z 
2025-12-04T10:11:57.9519547Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9520125Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9520131Z 
2025-12-04T10:11:57.9520293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9520424Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9520521Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9520873Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9521004Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9521069Z graph_break []
2025-12-04T10:11:57.9521195Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9521891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9521964Z   if out == self.unknown_value:
2025-12-04T10:11:57.9522086Z ----------------------------- Captured stdout call -----------------------------
﻿2025-12-04T10:11:57.9523906Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9524067Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9524433Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9524582Z graph_break []
2025-12-04T10:11:57.9524682Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9524997Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9525083Z Traceback (most recent call last):
2025-12-04T10:11:57.9525408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9525483Z     method(*args, **kwargs)
2025-12-04T10:11:57.9525790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9525874Z     method(*args, **kwargs)
2025-12-04T10:11:57.9526168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9526230Z     with policy():
2025-12-04T10:11:57.9526525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9526597Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9527457Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9527463Z 
2025-12-04T10:11:57.9527603Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9528141Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9528146Z 
2025-12-04T10:11:57.9528312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9528450Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9528550Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9528906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9529040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9529102Z graph_break []
2025-12-04T10:11:57.9529235Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9529941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9530019Z   if out == self.unknown_value:
2025-12-04T10:11:57.9530144Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9530243Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9530376Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9530723Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9530785Z graph_break []
2025-12-04T10:11:57.9530908Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9531034Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9531219Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9531559Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9531655Z graph_break []
2025-12-04T10:11:57.9532155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml -
2025-12-04T10:11:57.9532258Z =========================== short test summary info ============================
2025-12-04T10:11:57.9533549Z FAILED [0.4037s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9533556Z 
2025-12-04T10:11:57.9533684Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9534211Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9534249Z 
2025-12-04T10:11:57.9534408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9534513Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9534635Z ================== 1 failed, 57 deselected, 2 rerun in 11.95s ==================
2025-12-04T10:11:57.9534695Z Got exit code 1
2025-12-04T10:11:57.9534762Z Retrying single test...
2025-12-04T10:11:57.9535035Z W1204 10:09:18.784000 65066 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9535425Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml
2025-12-04T10:11:57.9535526Z ============================= test session starts ==============================
2025-12-04T10:11:57.9535739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9535809Z cachedir: .pytest_cache
2025-12-04T10:11:57.9536120Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9536197Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9536275Z configfile: pytest.ini
2025-12-04T10:11:57.9536592Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9536725Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9537296Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9537369Z Running 1 items in this shard
2025-12-04T10:11:57.9537372Z 
2025-12-04T10:11:57.9538107Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:20.066412167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9538112Z 
2025-12-04T10:11:57.9538411Z [W1204 10:09:29.070564571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9538416Z 
2025-12-04T10:11:57.9538796Z [W1204 10:09:29.070818426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9538800Z 
2025-12-04T10:11:57.9539088Z [W1204 10:09:29.077168165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9539127Z 
2025-12-04T10:11:57.9539418Z [W1204 10:09:29.077761495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9539427Z 
2025-12-04T10:11:57.9539714Z [W1204 10:09:29.077927928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9539717Z 
2025-12-04T10:11:57.9540003Z [W1204 10:09:29.083409683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9540006Z 
2025-12-04T10:11:57.9540312Z [W1204 10:09:29.083958322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9540316Z 
2025-12-04T10:11:57.9540603Z [W1204 10:09:29.084116265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9540608Z 
2025-12-04T10:11:57.9540693Z ('RERUN', {'yellow': True}) [10.9128s] [100%]
2025-12-04T10:11:57.9541450Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:30.079194891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9541455Z 
2025-12-04T10:11:57.9541746Z [W1204 10:09:30.079761421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9541750Z 
2025-12-04T10:11:57.9542039Z [W1204 10:09:30.079908963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9542042Z 
2025-12-04T10:11:57.9542337Z [W1204 10:09:30.082839953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9542342Z 
2025-12-04T10:11:57.9542628Z [W1204 10:09:30.083403853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9542633Z 
2025-12-04T10:11:57.9542917Z [W1204 10:09:30.083546225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9542923Z 
2025-12-04T10:11:57.9543207Z [W1204 10:09:30.087972051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9543211Z 
2025-12-04T10:11:57.9543496Z [W1204 10:09:30.088434889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9543501Z 
2025-12-04T10:11:57.9543791Z [W1204 10:09:30.088575941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9543795Z 
2025-12-04T10:11:57.9543875Z ('RERUN', {'yellow': True}) [0.4185s] [100%]
2025-12-04T10:11:57.9544606Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:09:30.494506358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9544610Z 
2025-12-04T10:11:57.9544904Z [W1204 10:09:30.495074797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9544908Z 
2025-12-04T10:11:57.9545233Z [W1204 10:09:30.495224600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9545273Z 
2025-12-04T10:11:57.9545571Z [W1204 10:09:30.498159521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9545607Z 
2025-12-04T10:11:57.9545902Z [W1204 10:09:30.498728040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9545906Z 
2025-12-04T10:11:57.9546200Z [W1204 10:09:30.498872043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9546203Z 
2025-12-04T10:11:57.9546492Z [W1204 10:09:30.503430741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9546496Z 
2025-12-04T10:11:57.9546787Z [W1204 10:09:30.503899300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9546792Z 
2025-12-04T10:11:57.9547079Z [W1204 10:09:30.504038762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9547084Z 
2025-12-04T10:11:57.9547151Z FAILED [0.4131s] [100%]
2025-12-04T10:11:57.9547154Z 
2025-12-04T10:11:57.9547239Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9547572Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9547650Z Traceback (most recent call last):
2025-12-04T10:11:57.9547967Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9548036Z     method(*args, **kwargs)
2025-12-04T10:11:57.9548333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9548398Z     method(*args, **kwargs)
2025-12-04T10:11:57.9548690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9548752Z     with policy():
2025-12-04T10:11:57.9549052Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9549125Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9549922Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9549927Z 
2025-12-04T10:11:57.9550061Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9550582Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9550586Z 
2025-12-04T10:11:57.9550751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9550881Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9550980Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9551337Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9551466Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9551529Z graph_break []
2025-12-04T10:11:57.9551654Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9552448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9552560Z   if out == self.unknown_value:
2025-12-04T10:11:57.9552853Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9552930Z Traceback (most recent call last):
2025-12-04T10:11:57.9553235Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9553300Z     method(*args, **kwargs)
2025-12-04T10:11:57.9553601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9553663Z     method(*args, **kwargs)
2025-12-04T10:11:57.9553954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9554020Z     with policy():
2025-12-04T10:11:57.9554314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9554386Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9555232Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9555237Z 
2025-12-04T10:11:57.9555366Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9555888Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9555895Z 
2025-12-04T10:11:57.9556052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9556180Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9556278Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9556629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9556755Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9556813Z graph_break []
2025-12-04T10:11:57.9556940Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9557635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9557717Z   if out == self.unknown_value:
2025-12-04T10:11:57.9557839Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9557932Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9558061Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9558405Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9558473Z graph_break []
2025-12-04T10:11:57.9558564Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9558858Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9558935Z Traceback (most recent call last):
2025-12-04T10:11:57.9559317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9559384Z     method(*args, **kwargs)
2025-12-04T10:11:57.9559683Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9559782Z     method(*args, **kwargs)
2025-12-04T10:11:57.9560199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9560265Z     with policy():
2025-12-04T10:11:57.9560572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9560641Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9561464Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9561470Z 
2025-12-04T10:11:57.9561596Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9562122Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9562125Z 
2025-12-04T10:11:57.9562324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9562457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9562550Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9562900Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9563031Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9563089Z graph_break []
2025-12-04T10:11:57.9563217Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9563917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9563988Z   if out == self.unknown_value:
2025-12-04T10:11:57.9564125Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9564220Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9564346Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9564690Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9564750Z graph_break []
2025-12-04T10:11:57.9564874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9564965Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9565086Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9565430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9565490Z graph_break []
2025-12-04T10:11:57.9565982Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml -
2025-12-04T10:11:57.9566082Z =========================== short test summary info ============================
2025-12-04T10:11:57.9567415Z FAILED [0.4131s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9567488Z 
2025-12-04T10:11:57.9567615Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9568141Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9568149Z 
2025-12-04T10:11:57.9568307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9568412Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9568530Z ================== 1 failed, 57 deselected, 2 rerun in 11.77s ==================
2025-12-04T10:11:57.9568588Z Got exit code 1
2025-12-04T10:11:57.9569058Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9569351Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9569620Z W1204 10:09:37.084000 65252 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9570211Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml
2025-12-04T10:11:57.9570323Z ============================= test session starts ==============================
2025-12-04T10:11:57.9570540Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9570615Z cachedir: .pytest_cache
2025-12-04T10:11:57.9570923Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9571005Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9571071Z configfile: pytest.ini
2025-12-04T10:11:57.9571389Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9571522Z collecting ... collected 58 items / 55 deselected / 3 selected
2025-12-04T10:11:57.9571609Z stepcurrent: skipping 55 already run items.
2025-12-04T10:11:57.9571680Z Running 3 items in this shard
2025-12-04T10:11:57.9571684Z 
2025-12-04T10:11:57.9572194Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8789s] [ 33%]
2025-12-04T10:11:57.9572687Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4977s] [ 33%]
2025-12-04T10:11:57.9573135Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.5175s] [ 33%]
2025-12-04T10:11:57.9573141Z 
2025-12-04T10:11:57.9573226Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9573524Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9573599Z Traceback (most recent call last):
2025-12-04T10:11:57.9573972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9574082Z     method(*args, **kwargs)
2025-12-04T10:11:57.9574381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9574481Z     method(*args, **kwargs)
2025-12-04T10:11:57.9574773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9574835Z     with policy():
2025-12-04T10:11:57.9575134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9575201Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9576008Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9576017Z 
2025-12-04T10:11:57.9576148Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9576670Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9576676Z 
2025-12-04T10:11:57.9576838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9576999Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9577099Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9577444Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9577576Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9577644Z graph_break []
2025-12-04T10:11:57.9577944Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9578027Z Traceback (most recent call last):
2025-12-04T10:11:57.9578333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9578397Z     method(*args, **kwargs)
2025-12-04T10:11:57.9578694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9578757Z     method(*args, **kwargs)
2025-12-04T10:11:57.9579046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9579110Z     with policy():
2025-12-04T10:11:57.9579405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9579473Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9580296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9580301Z 
2025-12-04T10:11:57.9580429Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9580958Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9580962Z 
2025-12-04T10:11:57.9581121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9581303Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9581435Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9581780Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9581943Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9582001Z graph_break []
2025-12-04T10:11:57.9582129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9582218Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9582339Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9582680Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9582738Z graph_break []
2025-12-04T10:11:57.9582824Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9583121Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9583194Z Traceback (most recent call last):
2025-12-04T10:11:57.9583497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9583566Z     method(*args, **kwargs)
2025-12-04T10:11:57.9583896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9583966Z     method(*args, **kwargs)
2025-12-04T10:11:57.9584256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9584316Z     with policy():
2025-12-04T10:11:57.9584614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9584680Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9585508Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9585513Z 
2025-12-04T10:11:57.9585649Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9586176Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9586180Z 
2025-12-04T10:11:57.9586335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9586462Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9586560Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9586905Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9587037Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9587094Z graph_break []
2025-12-04T10:11:57.9587217Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9587309Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9587438Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9587781Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9587839Z graph_break []
2025-12-04T10:11:57.9588040Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9588127Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9588252Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9588626Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9588684Z graph_break []
2025-12-04T10:11:57.9589179Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml -
2025-12-04T10:11:57.9589280Z =========================== short test summary info ============================
2025-12-04T10:11:57.9590583Z FAILED [0.5175s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9590592Z 
2025-12-04T10:11:57.9590715Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9591269Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9591278Z 
2025-12-04T10:11:57.9591434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9591536Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9591657Z ================== 1 failed, 55 deselected, 2 rerun in 2.92s ===================
2025-12-04T10:11:57.9591717Z Got exit code 1
2025-12-04T10:11:57.9591781Z Retrying single test...
2025-12-04T10:11:57.9592045Z W1204 10:09:46.794000 65441 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9592431Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml
2025-12-04T10:11:57.9592531Z ============================= test session starts ==============================
2025-12-04T10:11:57.9592739Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9592805Z cachedir: .pytest_cache
2025-12-04T10:11:57.9593109Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9593186Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9593253Z configfile: pytest.ini
2025-12-04T10:11:57.9593572Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9593712Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9594292Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9594364Z Running 1 items in this shard
2025-12-04T10:11:57.9594368Z 
2025-12-04T10:11:57.9595099Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:09:47.892578085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9595109Z 
2025-12-04T10:11:57.9595446Z [W1204 10:09:57.068279538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9595486Z 
2025-12-04T10:11:57.9595779Z [W1204 10:09:57.068558242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9595815Z 
2025-12-04T10:11:57.9596108Z [W1204 10:09:57.075012090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9596112Z 
2025-12-04T10:11:57.9596401Z [W1204 10:09:57.075584810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9596405Z 
2025-12-04T10:11:57.9596694Z [W1204 10:09:57.075754193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9596697Z 
2025-12-04T10:11:57.9596984Z [W1204 10:09:57.081245885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9596989Z 
2025-12-04T10:11:57.9597278Z [W1204 10:09:57.081785454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9597283Z 
2025-12-04T10:11:57.9597571Z [W1204 10:09:57.081942927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9597574Z 
2025-12-04T10:11:57.9597698Z ('RERUN', {'yellow': True}) [11.0769s] [100%]
2025-12-04T10:11:57.9598424Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:09:58.288243831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9598428Z 
2025-12-04T10:11:57.9598717Z [W1204 10:09:58.288780290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9598725Z 
2025-12-04T10:11:57.9599012Z [W1204 10:09:58.288919532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9599016Z 
2025-12-04T10:11:57.9599301Z [W1204 10:09:58.291906082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9599304Z 
2025-12-04T10:11:57.9599594Z [W1204 10:09:58.292480081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9599597Z 
2025-12-04T10:11:57.9599957Z [W1204 10:09:58.292619214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9599961Z 
2025-12-04T10:11:57.9600258Z [W1204 10:09:58.297128598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9600264Z 
2025-12-04T10:11:57.9600550Z [W1204 10:09:58.297588596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9600556Z 
2025-12-04T10:11:57.9600845Z [W1204 10:09:58.297724198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9600848Z 
2025-12-04T10:11:57.9600928Z ('RERUN', {'yellow': True}) [0.4529s] [100%]
2025-12-04T10:11:57.9601651Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:09:58.738491187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9601657Z 
2025-12-04T10:11:57.9601985Z [W1204 10:09:58.739009685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9602025Z 
2025-12-04T10:11:57.9602313Z [W1204 10:09:58.739149818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9602349Z 
2025-12-04T10:11:57.9602641Z [W1204 10:09:58.741994985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9602645Z 
2025-12-04T10:11:57.9602933Z [W1204 10:09:58.742536234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9602936Z 
2025-12-04T10:11:57.9603231Z [W1204 10:09:58.742674617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9603234Z 
2025-12-04T10:11:57.9603526Z [W1204 10:09:58.747022159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9603530Z 
2025-12-04T10:11:57.9603820Z [W1204 10:09:58.747467676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9603825Z 
2025-12-04T10:11:57.9604117Z [W1204 10:09:58.747610408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9604120Z 
2025-12-04T10:11:57.9604187Z FAILED [0.4481s] [100%]
2025-12-04T10:11:57.9604243Z 
2025-12-04T10:11:57.9604331Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9604628Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9604706Z Traceback (most recent call last):
2025-12-04T10:11:57.9605014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9605084Z     method(*args, **kwargs)
2025-12-04T10:11:57.9605384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9605450Z     method(*args, **kwargs)
2025-12-04T10:11:57.9605748Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9605808Z     with policy():
2025-12-04T10:11:57.9606103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9606171Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9606979Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9606984Z 
2025-12-04T10:11:57.9607113Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9607636Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9607641Z 
2025-12-04T10:11:57.9607805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9607934Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9608036Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9608397Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9608568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9608741Z graph_break []
2025-12-04T10:11:57.9608870Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9609560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9609668Z   if out == self.unknown_value:
2025-12-04T10:11:57.9609965Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9610041Z Traceback (most recent call last):
2025-12-04T10:11:57.9610345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9610409Z     method(*args, **kwargs)
2025-12-04T10:11:57.9610709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9610773Z     method(*args, **kwargs)
2025-12-04T10:11:57.9611065Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9611129Z     with policy():
2025-12-04T10:11:57.9611428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9611497Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9612352Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9612356Z 
2025-12-04T10:11:57.9612484Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9613010Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9613015Z 
2025-12-04T10:11:57.9613172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9613300Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9613393Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9613741Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9613872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9613932Z graph_break []
2025-12-04T10:11:57.9614057Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9614751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9614823Z   if out == self.unknown_value:
2025-12-04T10:11:57.9614950Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9615038Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9615164Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9615510Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9615569Z graph_break []
2025-12-04T10:11:57.9615656Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9615987Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9616099Z Traceback (most recent call last):
2025-12-04T10:11:57.9616402Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9616499Z     method(*args, **kwargs)
2025-12-04T10:11:57.9616789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9616857Z     method(*args, **kwargs)
2025-12-04T10:11:57.9617339Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9617405Z     with policy():
2025-12-04T10:11:57.9617699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9617764Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9618616Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9618622Z 
2025-12-04T10:11:57.9618748Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9619343Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9619347Z 
2025-12-04T10:11:57.9619510Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9619638Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9619730Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9620072Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9620200Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9620260Z graph_break []
2025-12-04T10:11:57.9620381Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9621078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9621149Z   if out == self.unknown_value:
2025-12-04T10:11:57.9621277Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9621367Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9621492Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9621836Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9621897Z graph_break []
2025-12-04T10:11:57.9622022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9622110Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9622232Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9622572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9622630Z graph_break []
2025-12-04T10:11:57.9623171Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml -
2025-12-04T10:11:57.9623328Z =========================== short test summary info ============================
2025-12-04T10:11:57.9624632Z FAILED [0.4481s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9624686Z 
2025-12-04T10:11:57.9624811Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9625330Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9625336Z 
2025-12-04T10:11:57.9625493Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9625597Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9625719Z ================== 1 failed, 57 deselected, 2 rerun in 12.00s ==================
2025-12-04T10:11:57.9625777Z Got exit code 1
2025-12-04T10:11:57.9625843Z Retrying single test...
2025-12-04T10:11:57.9626143Z W1204 10:10:05.333000 65634 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9626528Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml
2025-12-04T10:11:57.9626622Z ============================= test session starts ==============================
2025-12-04T10:11:57.9626833Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9626900Z cachedir: .pytest_cache
2025-12-04T10:11:57.9627205Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9627283Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9627348Z configfile: pytest.ini
2025-12-04T10:11:57.9627667Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9627798Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9628379Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9628454Z Running 1 items in this shard
2025-12-04T10:11:57.9628457Z 
2025-12-04T10:11:57.9629189Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:10:06.422887374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9629196Z 
2025-12-04T10:11:57.9629509Z [W1204 10:10:15.582057755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9629513Z 
2025-12-04T10:11:57.9629806Z [W1204 10:10:15.582312549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9629809Z 
2025-12-04T10:11:57.9630101Z [W1204 10:10:15.588201890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9630104Z 
2025-12-04T10:11:57.9630432Z [W1204 10:10:15.588792500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9630472Z 
2025-12-04T10:11:57.9630765Z [W1204 10:10:15.588958443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9630822Z 
2025-12-04T10:11:57.9631109Z [W1204 10:10:15.594491878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9631112Z 
2025-12-04T10:11:57.9631405Z [W1204 10:10:15.595024677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9631408Z 
2025-12-04T10:11:57.9631693Z [W1204 10:10:15.595182590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9631696Z 
2025-12-04T10:11:57.9631778Z ('RERUN', {'yellow': True}) [11.0542s] [100%]
2025-12-04T10:11:57.9632510Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:10:16.806290806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9632516Z 
2025-12-04T10:11:57.9632802Z [W1204 10:10:16.806832016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9632805Z 
2025-12-04T10:11:57.9633130Z [W1204 10:10:16.806975998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9633134Z 
2025-12-04T10:11:57.9633421Z [W1204 10:10:16.809872858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9633424Z 
2025-12-04T10:11:57.9633725Z [W1204 10:10:16.810452908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9633730Z 
2025-12-04T10:11:57.9634019Z [W1204 10:10:16.810596890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9634022Z 
2025-12-04T10:11:57.9634317Z [W1204 10:10:16.815034216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9634320Z 
2025-12-04T10:11:57.9634606Z [W1204 10:10:16.815487474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9634609Z 
2025-12-04T10:11:57.9634894Z [W1204 10:10:16.815623937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9634900Z 
2025-12-04T10:11:57.9634979Z ('RERUN', {'yellow': True}) [0.4506s] [100%]
2025-12-04T10:11:57.9635700Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:10:17.254901241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9635706Z 
2025-12-04T10:11:57.9635998Z [W1204 10:10:17.255417410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9636001Z 
2025-12-04T10:11:57.9636289Z [W1204 10:10:17.255560802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9636292Z 
2025-12-04T10:11:57.9636584Z [W1204 10:10:17.258397141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9636587Z 
2025-12-04T10:11:57.9636873Z [W1204 10:10:17.258932500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9636947Z 
2025-12-04T10:11:57.9637239Z [W1204 10:10:17.259069612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9637243Z 
2025-12-04T10:11:57.9637528Z [W1204 10:10:17.263492198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9637564Z 
2025-12-04T10:11:57.9637864Z [W1204 10:10:17.263939276 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9637867Z 
2025-12-04T10:11:57.9638153Z [W1204 10:10:17.264075038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9638157Z 
2025-12-04T10:11:57.9638218Z FAILED [0.4456s] [100%]
2025-12-04T10:11:57.9638221Z 
2025-12-04T10:11:57.9638320Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9638623Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9638700Z Traceback (most recent call last):
2025-12-04T10:11:57.9639013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9639079Z     method(*args, **kwargs)
2025-12-04T10:11:57.9639415Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9639478Z     method(*args, **kwargs)
2025-12-04T10:11:57.9639773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9639833Z     with policy():
2025-12-04T10:11:57.9640177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9640253Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9641061Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9641067Z 
2025-12-04T10:11:57.9641199Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9641723Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9641727Z 
2025-12-04T10:11:57.9641886Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9642029Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9642131Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9642482Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9642610Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9642669Z graph_break []
2025-12-04T10:11:57.9642796Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9643489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9643564Z   if out == self.unknown_value:
2025-12-04T10:11:57.9643860Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9644023Z Traceback (most recent call last):
2025-12-04T10:11:57.9644326Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9644390Z     method(*args, **kwargs)
2025-12-04T10:11:57.9644716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9644783Z     method(*args, **kwargs)
2025-12-04T10:11:57.9645073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9645135Z     with policy():
2025-12-04T10:11:57.9645428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9645494Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9646318Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9646324Z 
2025-12-04T10:11:57.9646450Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9647009Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9647014Z 
2025-12-04T10:11:57.9647172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9647297Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9647394Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9647740Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9647872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9647929Z graph_break []
2025-12-04T10:11:57.9648052Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9648745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9648815Z   if out == self.unknown_value:
2025-12-04T10:11:57.9648939Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9649030Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9649152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9649499Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9649558Z graph_break []
2025-12-04T10:11:57.9649643Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9649940Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T10:11:57.9650015Z Traceback (most recent call last):
2025-12-04T10:11:57.9650318Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9650383Z     method(*args, **kwargs)
2025-12-04T10:11:57.9650677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9650741Z     method(*args, **kwargs)
2025-12-04T10:11:57.9651072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9651173Z     with policy():
2025-12-04T10:11:57.9651479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9651579Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9652402Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9652406Z 
2025-12-04T10:11:57.9652532Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9653058Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9653063Z 
2025-12-04T10:11:57.9653217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9653342Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9653436Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9653777Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9653936Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9653997Z graph_break []
2025-12-04T10:11:57.9654121Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9654811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9654881Z   if out == self.unknown_value:
2025-12-04T10:11:57.9655004Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9655094Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9655220Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9655565Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9655624Z graph_break []
2025-12-04T10:11:57.9655744Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9655837Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9655957Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9656303Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9656362Z graph_break []
2025-12-04T10:11:57.9656850Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml -
2025-12-04T10:11:57.9656955Z =========================== short test summary info ============================
2025-12-04T10:11:57.9658415Z FAILED [0.4456s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9658480Z 
2025-12-04T10:11:57.9658632Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9659255Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9659298Z 
2025-12-04T10:11:57.9659489Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9659613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9659748Z ================== 1 failed, 57 deselected, 2 rerun in 11.97s ==================
2025-12-04T10:11:57.9659822Z Got exit code 1
2025-12-04T10:11:57.9660314Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T10:11:57.9660561Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9660836Z W1204 10:10:23.888000 65827 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9661228Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml
2025-12-04T10:11:57.9661327Z ============================= test session starts ==============================
2025-12-04T10:11:57.9661569Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9661637Z cachedir: .pytest_cache
2025-12-04T10:11:57.9661945Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9662021Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9662088Z configfile: pytest.ini
2025-12-04T10:11:57.9662405Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9662535Z collecting ... collected 58 items / 56 deselected / 2 selected
2025-12-04T10:11:57.9662626Z stepcurrent: skipping 56 already run items.
2025-12-04T10:11:57.9662697Z Running 2 items in this shard
2025-12-04T10:11:57.9662700Z 
2025-12-04T10:11:57.9663199Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8403s] [ 50%]
2025-12-04T10:11:57.9663682Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4514s] [ 50%]
2025-12-04T10:11:57.9664126Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4392s] [ 50%]
2025-12-04T10:11:57.9664135Z 
2025-12-04T10:11:57.9664218Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9664507Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9664587Z Traceback (most recent call last):
2025-12-04T10:11:57.9664896Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9664972Z     method(*args, **kwargs)
2025-12-04T10:11:57.9665274Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9665339Z     method(*args, **kwargs)
2025-12-04T10:11:57.9665630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9665729Z     with policy():
2025-12-04T10:11:57.9666059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9666129Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9666927Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9666964Z 
2025-12-04T10:11:57.9667094Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9667629Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9667633Z 
2025-12-04T10:11:57.9667821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9667977Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9668094Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9668508Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9668660Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9668769Z graph_break []
2025-12-04T10:11:57.9669113Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9669199Z Traceback (most recent call last):
2025-12-04T10:11:57.9669544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9669608Z     method(*args, **kwargs)
2025-12-04T10:11:57.9669904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9669971Z     method(*args, **kwargs)
2025-12-04T10:11:57.9670259Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9670319Z     with policy():
2025-12-04T10:11:57.9670619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9670686Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9671496Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9671500Z 
2025-12-04T10:11:57.9671627Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9672144Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9672152Z 
2025-12-04T10:11:57.9672308Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9672432Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9672531Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9672876Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9673002Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9673064Z graph_break []
2025-12-04T10:11:57.9673225Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9673350Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9673470Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9673845Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9673909Z graph_break []
2025-12-04T10:11:57.9673993Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9674280Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9674358Z Traceback (most recent call last):
2025-12-04T10:11:57.9674657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9674724Z     method(*args, **kwargs)
2025-12-04T10:11:57.9675014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9675088Z     method(*args, **kwargs)
2025-12-04T10:11:57.9675384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9675446Z     with policy():
2025-12-04T10:11:57.9675778Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9675846Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9676649Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9676652Z 
2025-12-04T10:11:57.9676782Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9677296Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9677301Z 
2025-12-04T10:11:57.9677458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9677589Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9677683Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9678028Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9678151Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9678215Z graph_break []
2025-12-04T10:11:57.9678339Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9678428Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9678555Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9678895Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9678953Z graph_break []
2025-12-04T10:11:57.9679079Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9679168Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9679292Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9679633Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9679692Z graph_break []
2025-12-04T10:11:57.9680302Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml -
2025-12-04T10:11:57.9680404Z =========================== short test summary info ============================
2025-12-04T10:11:57.9681719Z FAILED [0.4392s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9681723Z 
2025-12-04T10:11:57.9681846Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9682371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9682377Z 
2025-12-04T10:11:57.9682528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9682634Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9682755Z ================== 1 failed, 56 deselected, 2 rerun in 2.75s ===================
2025-12-04T10:11:57.9682847Z Got exit code 1
2025-12-04T10:11:57.9682917Z Retrying single test...
2025-12-04T10:11:57.9683181Z W1204 10:10:33.567000 66008 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9683569Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml
2025-12-04T10:11:57.9683669Z ============================= test session starts ==============================
2025-12-04T10:11:57.9683879Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9683944Z cachedir: .pytest_cache
2025-12-04T10:11:57.9684265Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9684342Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9684410Z configfile: pytest.ini
2025-12-04T10:11:57.9684732Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9684860Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9685429Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9685500Z Running 1 items in this shard
2025-12-04T10:11:57.9685504Z 
2025-12-04T10:11:57.9686234Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:34.607733399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9686239Z 
2025-12-04T10:11:57.9686539Z [W1204 10:10:43.607510043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9686542Z 
2025-12-04T10:11:57.9686833Z [W1204 10:10:43.607762847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9686836Z 
2025-12-04T10:11:57.9687126Z [W1204 10:10:43.613899272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9687216Z 
2025-12-04T10:11:57.9687504Z [W1204 10:10:43.614487722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9687510Z 
2025-12-04T10:11:57.9687798Z [W1204 10:10:43.614659345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9687837Z 
2025-12-04T10:11:57.9688126Z [W1204 10:10:43.620094798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9688129Z 
2025-12-04T10:11:57.9688421Z [W1204 10:10:43.620623937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9688424Z 
2025-12-04T10:11:57.9688710Z [W1204 10:10:43.620778299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9688715Z 
2025-12-04T10:11:57.9688800Z ('RERUN', {'yellow': True}) [10.8465s] [100%]
2025-12-04T10:11:57.9689525Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:44.789419187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9689531Z 
2025-12-04T10:11:57.9689855Z [W1204 10:10:44.789979377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9689860Z 
2025-12-04T10:11:57.9690148Z [W1204 10:10:44.790138480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9690151Z 
2025-12-04T10:11:57.9690442Z [W1204 10:10:44.793032309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9690448Z 
2025-12-04T10:11:57.9690733Z [W1204 10:10:44.793575788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9690737Z 
2025-12-04T10:11:57.9691020Z [W1204 10:10:44.793717881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9691028Z 
2025-12-04T10:11:57.9691316Z [W1204 10:10:44.798132966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9691319Z 
2025-12-04T10:11:57.9691603Z [W1204 10:10:44.798583064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9691606Z 
2025-12-04T10:11:57.9691897Z [W1204 10:10:44.798719316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9691900Z 
2025-12-04T10:11:57.9691993Z ('RERUN', {'yellow': True}) [0.4057s] [100%]
2025-12-04T10:11:57.9692713Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:45.192145823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9692719Z 
2025-12-04T10:11:57.9693008Z [W1204 10:10:45.192713263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9693011Z 
2025-12-04T10:11:57.9693303Z [W1204 10:10:45.192853925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9693307Z 
2025-12-04T10:11:57.9693592Z [W1204 10:10:45.195685723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9693595Z 
2025-12-04T10:11:57.9693954Z [W1204 10:10:45.196222783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9693963Z 
2025-12-04T10:11:57.9694249Z [W1204 10:10:45.196368615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9694285Z 
2025-12-04T10:11:57.9694572Z [W1204 10:10:45.200741960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9694577Z 
2025-12-04T10:11:57.9694868Z [W1204 10:10:45.201197297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9694871Z 
2025-12-04T10:11:57.9695158Z [W1204 10:10:45.201335310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9695161Z 
2025-12-04T10:11:57.9695227Z FAILED [0.4040s] [100%]
2025-12-04T10:11:57.9695232Z 
2025-12-04T10:11:57.9695317Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9695612Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9695687Z Traceback (most recent call last):
2025-12-04T10:11:57.9695996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9696101Z     method(*args, **kwargs)
2025-12-04T10:11:57.9696397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9696461Z     method(*args, **kwargs)
2025-12-04T10:11:57.9696764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9696825Z     with policy():
2025-12-04T10:11:57.9697129Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9697197Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9697989Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9697996Z 
2025-12-04T10:11:57.9698127Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9698645Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9698649Z 
2025-12-04T10:11:57.9698813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9698941Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9699039Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9699391Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9699520Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9699585Z graph_break []
2025-12-04T10:11:57.9699710Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9700403Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9700476Z   if out == self.unknown_value:
2025-12-04T10:11:57.9700836Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9700914Z Traceback (most recent call last):
2025-12-04T10:11:57.9701214Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9701311Z     method(*args, **kwargs)
2025-12-04T10:11:57.9701609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9701672Z     method(*args, **kwargs)
2025-12-04T10:11:57.9701963Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9702026Z     with policy():
2025-12-04T10:11:57.9702322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9702390Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9703194Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9703199Z 
2025-12-04T10:11:57.9703329Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9703880Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9703884Z 
2025-12-04T10:11:57.9704042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9704172Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9704267Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9704615Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9704741Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9704802Z graph_break []
2025-12-04T10:11:57.9704930Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9705620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9705691Z   if out == self.unknown_value:
2025-12-04T10:11:57.9705829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9705920Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9706047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9706388Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9706448Z graph_break []
2025-12-04T10:11:57.9706540Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9706830Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9706906Z Traceback (most recent call last):
2025-12-04T10:11:57.9707205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9707270Z     method(*args, **kwargs)
2025-12-04T10:11:57.9707563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9707698Z     method(*args, **kwargs)
2025-12-04T10:11:57.9707989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9708052Z     with policy():
2025-12-04T10:11:57.9708379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9708447Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9709257Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9709262Z 
2025-12-04T10:11:57.9709386Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9709904Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9709910Z 
2025-12-04T10:11:57.9710064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9710193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9710284Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9710681Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9710817Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9710876Z graph_break []
2025-12-04T10:11:57.9711007Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9711697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9711770Z   if out == self.unknown_value:
2025-12-04T10:11:57.9711900Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9711994Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9712122Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9712467Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9712526Z graph_break []
2025-12-04T10:11:57.9712651Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9712742Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9712875Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9713226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9713287Z graph_break []
2025-12-04T10:11:57.9713786Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml -
2025-12-04T10:11:57.9713909Z =========================== short test summary info ============================
2025-12-04T10:11:57.9715226Z FAILED [0.4040s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9715265Z 
2025-12-04T10:11:57.9715395Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9715948Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9715954Z 
2025-12-04T10:11:57.9716115Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9716220Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9716337Z ================== 1 failed, 57 deselected, 2 rerun in 11.68s ==================
2025-12-04T10:11:57.9716396Z Got exit code 1
2025-12-04T10:11:57.9716462Z Retrying single test...
2025-12-04T10:11:57.9716731Z W1204 10:10:51.748000 66194 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9717286Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml
2025-12-04T10:11:57.9717392Z ============================= test session starts ==============================
2025-12-04T10:11:57.9717600Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9717665Z cachedir: .pytest_cache
2025-12-04T10:11:57.9718035Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9718126Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9718193Z configfile: pytest.ini
2025-12-04T10:11:57.9718514Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9718644Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9719216Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9719288Z Running 1 items in this shard
2025-12-04T10:11:57.9719292Z 
2025-12-04T10:11:57.9720057Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:10:52.789383828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9720066Z 
2025-12-04T10:11:57.9720366Z [W1204 10:11:01.674350718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9720370Z 
2025-12-04T10:11:57.9720664Z [W1204 10:11:01.674608782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9720669Z 
2025-12-04T10:11:57.9720962Z [W1204 10:11:01.680486553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9720967Z 
2025-12-04T10:11:57.9721252Z [W1204 10:11:01.681068473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9721256Z 
2025-12-04T10:11:57.9721558Z [W1204 10:11:01.681235415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9721562Z 
2025-12-04T10:11:57.9721853Z [W1204 10:11:01.686641337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9721857Z 
2025-12-04T10:11:57.9722275Z [W1204 10:11:01.687170496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9722324Z 
2025-12-04T10:11:57.9722619Z [W1204 10:11:01.687329929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9722667Z 
2025-12-04T10:11:57.9722753Z ('RERUN', {'yellow': True}) [10.7303s] [100%]
2025-12-04T10:11:57.9723501Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:11:02.855035512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9723505Z 
2025-12-04T10:11:57.9723796Z [W1204 10:11:02.855598662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9723801Z 
2025-12-04T10:11:57.9724095Z [W1204 10:11:02.855737964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9724100Z 
2025-12-04T10:11:57.9724386Z [W1204 10:11:02.858602903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9724390Z 
2025-12-04T10:11:57.9724683Z [W1204 10:11:02.859153273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9724686Z 
2025-12-04T10:11:57.9725007Z [W1204 10:11:02.859290115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9725011Z 
2025-12-04T10:11:57.9725304Z [W1204 10:11:02.863763531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9725308Z 
2025-12-04T10:11:57.9725596Z [W1204 10:11:02.864217968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9725602Z 
2025-12-04T10:11:57.9725890Z [W1204 10:11:02.864366681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9725894Z 
2025-12-04T10:11:57.9725977Z ('RERUN', {'yellow': True}) [0.4101s] [100%]
2025-12-04T10:11:57.9726698Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:11:03.263802566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9726704Z 
2025-12-04T10:11:57.9726993Z [W1204 10:11:03.264376506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9726996Z 
2025-12-04T10:11:57.9727285Z [W1204 10:11:03.264515028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9727292Z 
2025-12-04T10:11:57.9727577Z [W1204 10:11:03.267422457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9727581Z 
2025-12-04T10:11:57.9727867Z [W1204 10:11:03.267975886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9727870Z 
2025-12-04T10:11:57.9728162Z [W1204 10:11:03.268117339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9728165Z 
2025-12-04T10:11:57.9728452Z [W1204 10:11:03.272800028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9728455Z 
2025-12-04T10:11:57.9728787Z [W1204 10:11:03.273262356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9728823Z 
2025-12-04T10:11:57.9729112Z [W1204 10:11:03.273423589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9729115Z 
2025-12-04T10:11:57.9729214Z FAILED [0.4091s] [100%]
2025-12-04T10:11:57.9729217Z 
2025-12-04T10:11:57.9729302Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9729600Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9729680Z Traceback (most recent call last):
2025-12-04T10:11:57.9729990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9730061Z     method(*args, **kwargs)
2025-12-04T10:11:57.9730362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9730428Z     method(*args, **kwargs)
2025-12-04T10:11:57.9730730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9730791Z     with policy():
2025-12-04T10:11:57.9731089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9731162Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9731993Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9731999Z 
2025-12-04T10:11:57.9732132Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9732656Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9732661Z 
2025-12-04T10:11:57.9732825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9732954Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9733047Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9733400Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9733528Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9733588Z graph_break []
2025-12-04T10:11:57.9733714Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9734410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9734483Z   if out == self.unknown_value:
2025-12-04T10:11:57.9734777Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9734850Z Traceback (most recent call last):
2025-12-04T10:11:57.9735159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9735225Z     method(*args, **kwargs)
2025-12-04T10:11:57.9735526Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9735590Z     method(*args, **kwargs)
2025-12-04T10:11:57.9735917Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9736012Z     with policy():
2025-12-04T10:11:57.9736309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9736426Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9737234Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9737238Z 
2025-12-04T10:11:57.9737366Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9737901Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9737908Z 
2025-12-04T10:11:57.9738067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9738196Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9738291Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9738641Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9738804Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9738865Z graph_break []
2025-12-04T10:11:57.9738987Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9739680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9739763Z   if out == self.unknown_value:
2025-12-04T10:11:57.9739894Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9739985Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9740110Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9740456Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9740515Z graph_break []
2025-12-04T10:11:57.9740601Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9740891Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T10:11:57.9740965Z Traceback (most recent call last):
2025-12-04T10:11:57.9741268Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9741334Z     method(*args, **kwargs)
2025-12-04T10:11:57.9741631Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9741696Z     method(*args, **kwargs)
2025-12-04T10:11:57.9741986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9742051Z     with policy():
2025-12-04T10:11:57.9742347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9742411Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9743505Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9743549Z 
2025-12-04T10:11:57.9743685Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9744211Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9744251Z 
2025-12-04T10:11:57.9744412Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9744540Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9744633Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9744974Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9745105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9745164Z graph_break []
2025-12-04T10:11:57.9745287Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9745982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9746053Z   if out == self.unknown_value:
2025-12-04T10:11:57.9746211Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9746302Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9746425Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9746775Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9746835Z graph_break []
2025-12-04T10:11:57.9746957Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9747046Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9747173Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9747509Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9747568Z graph_break []
2025-12-04T10:11:57.9748053Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml -
2025-12-04T10:11:57.9748156Z =========================== short test summary info ============================
2025-12-04T10:11:57.9749445Z FAILED [0.4091s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9749455Z 
2025-12-04T10:11:57.9749580Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9750099Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9750103Z 
2025-12-04T10:11:57.9750260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9750400Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9750547Z ================== 1 failed, 57 deselected, 2 rerun in 11.57s ==================
2025-12-04T10:11:57.9750613Z Got exit code 1
2025-12-04T10:11:57.9751086Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T10:11:57.9751367Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9751631Z W1204 10:11:09.929000 66380 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9752017Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml
2025-12-04T10:11:57.9752120Z ============================= test session starts ==============================
2025-12-04T10:11:57.9752331Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9752401Z cachedir: .pytest_cache
2025-12-04T10:11:57.9752706Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9752783Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9752852Z configfile: pytest.ini
2025-12-04T10:11:57.9753170Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9753334Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9753423Z stepcurrent: skipping 57 already run items.
2025-12-04T10:11:57.9753492Z Running 1 items in this shard
2025-12-04T10:11:57.9753496Z 
2025-12-04T10:11:57.9753992Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [1.8676s] [100%]
2025-12-04T10:11:57.9754479Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4534s] [100%]
2025-12-04T10:11:57.9754937Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4397s] [100%]
2025-12-04T10:11:57.9754941Z 
2025-12-04T10:11:57.9755028Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9755319Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9755393Z Traceback (most recent call last):
2025-12-04T10:11:57.9755701Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9755771Z     method(*args, **kwargs)
2025-12-04T10:11:57.9756068Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9756132Z     method(*args, **kwargs)
2025-12-04T10:11:57.9756427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9756488Z     with policy():
2025-12-04T10:11:57.9756785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9756853Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9757646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9757695Z 
2025-12-04T10:11:57.9757860Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9758378Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9758415Z 
2025-12-04T10:11:57.9758575Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9758704Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9758797Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9759143Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9759270Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9759328Z graph_break []
2025-12-04T10:11:57.9759622Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9759696Z Traceback (most recent call last):
2025-12-04T10:11:57.9760050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9760119Z     method(*args, **kwargs)
2025-12-04T10:11:57.9760410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9760515Z     method(*args, **kwargs)
2025-12-04T10:11:57.9760809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9760871Z     with policy():
2025-12-04T10:11:57.9761168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9761235Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9762044Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9762049Z 
2025-12-04T10:11:57.9762173Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9762694Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9762698Z 
2025-12-04T10:11:57.9762856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9762981Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9763084Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9763432Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9763562Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9763622Z graph_break []
2025-12-04T10:11:57.9763756Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9763851Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9763973Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9764313Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9764377Z graph_break []
2025-12-04T10:11:57.9764461Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9764843Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9764918Z Traceback (most recent call last):
2025-12-04T10:11:57.9765225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9765330Z     method(*args, **kwargs)
2025-12-04T10:11:57.9765625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9765692Z     method(*args, **kwargs)
2025-12-04T10:11:57.9765983Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9766042Z     with policy():
2025-12-04T10:11:57.9766348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9766416Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9767231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9767240Z 
2025-12-04T10:11:57.9767365Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9767921Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9767926Z 
2025-12-04T10:11:57.9768087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9768213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9768308Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9768656Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9768782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9768844Z graph_break []
2025-12-04T10:11:57.9768964Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9769055Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9769180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9769518Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9769583Z graph_break []
2025-12-04T10:11:57.9769706Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9769801Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9769932Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9770273Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9770339Z graph_break []
2025-12-04T10:11:57.9770831Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml -
2025-12-04T10:11:57.9770931Z =========================== short test summary info ============================
2025-12-04T10:11:57.9772253Z FAILED [0.4397s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9772320Z 
2025-12-04T10:11:57.9772448Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9772972Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9772975Z 
2025-12-04T10:11:57.9773132Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9773242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9773358Z ================== 1 failed, 57 deselected, 2 rerun in 2.78s ===================
2025-12-04T10:11:57.9773418Z Got exit code 1
2025-12-04T10:11:57.9773493Z Retrying single test...
2025-12-04T10:11:57.9773753Z W1204 10:11:19.570000 66561 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9774136Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml
2025-12-04T10:11:57.9774239Z ============================= test session starts ==============================
2025-12-04T10:11:57.9774480Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9774551Z cachedir: .pytest_cache
2025-12-04T10:11:57.9774865Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9774943Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9775012Z configfile: pytest.ini
2025-12-04T10:11:57.9775330Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9775457Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9776028Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9776099Z Running 1 items in this shard
2025-12-04T10:11:57.9776103Z 
2025-12-04T10:11:57.9776842Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:20.847036940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9776846Z 
2025-12-04T10:11:57.9777148Z [W1204 10:11:30.968059869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9777152Z 
2025-12-04T10:11:57.9777447Z [W1204 10:11:30.968324473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9777450Z 
2025-12-04T10:11:57.9777740Z [W1204 10:11:30.974270512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9777744Z 
2025-12-04T10:11:57.9778037Z [W1204 10:11:30.974867302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9778041Z 
2025-12-04T10:11:57.9778334Z [W1204 10:11:30.975038245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9778338Z 
2025-12-04T10:11:57.9778630Z [W1204 10:11:30.980534746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9778703Z 
2025-12-04T10:11:57.9778995Z [W1204 10:11:30.981076865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9778999Z 
2025-12-04T10:11:57.9779285Z [W1204 10:11:30.981234888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9779321Z 
2025-12-04T10:11:57.9779407Z ('RERUN', {'yellow': True}) [11.0205s] [100%]
2025-12-04T10:11:57.9780130Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:31.969446881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9780134Z 
2025-12-04T10:11:57.9780427Z [W1204 10:11:31.970024290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9780434Z 
2025-12-04T10:11:57.9780722Z [W1204 10:11:31.970167333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9780725Z 
2025-12-04T10:11:57.9781014Z [W1204 10:11:31.973050541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9781019Z 
2025-12-04T10:11:57.9781340Z [W1204 10:11:31.973594430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9781344Z 
2025-12-04T10:11:57.9781640Z [W1204 10:11:31.973731442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9781643Z 
2025-12-04T10:11:57.9781932Z [W1204 10:11:31.978129516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9781937Z 
2025-12-04T10:11:57.9782226Z [W1204 10:11:31.978585654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9782233Z 
2025-12-04T10:11:57.9782520Z [W1204 10:11:31.978721036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9782524Z 
2025-12-04T10:11:57.9782609Z ('RERUN', {'yellow': True}) [0.4099s] [100%]
2025-12-04T10:11:57.9783332Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:31.377713587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9783336Z 
2025-12-04T10:11:57.9783626Z [W1204 10:11:31.378277987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9783632Z 
2025-12-04T10:11:57.9783923Z [W1204 10:11:31.378417399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9783926Z 
2025-12-04T10:11:57.9784218Z [W1204 10:11:31.381398148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9784223Z 
2025-12-04T10:11:57.9784515Z [W1204 10:11:31.381949317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9784519Z 
2025-12-04T10:11:57.9784808Z [W1204 10:11:31.382088250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9784811Z 
2025-12-04T10:11:57.9785106Z [W1204 10:11:31.386657796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9785109Z 
2025-12-04T10:11:57.9785465Z [W1204 10:11:31.387117893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9785469Z 
2025-12-04T10:11:57.9785765Z [W1204 10:11:31.387253945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9785804Z 
2025-12-04T10:11:57.9785867Z FAILED [0.4068s] [100%]
2025-12-04T10:11:57.9785871Z 
2025-12-04T10:11:57.9785963Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9786265Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9786339Z Traceback (most recent call last):
2025-12-04T10:11:57.9786645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9786718Z     method(*args, **kwargs)
2025-12-04T10:11:57.9787017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9787084Z     method(*args, **kwargs)
2025-12-04T10:11:57.9787375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9787436Z     with policy():
2025-12-04T10:11:57.9787735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9787839Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9788639Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9788651Z 
2025-12-04T10:11:57.9788779Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9789299Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9789304Z 
2025-12-04T10:11:57.9789472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9789600Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9789698Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9790048Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9790175Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9790239Z graph_break []
2025-12-04T10:11:57.9790364Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9791067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9791139Z   if out == self.unknown_value:
2025-12-04T10:11:57.9791428Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9791506Z Traceback (most recent call last):
2025-12-04T10:11:57.9791810Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9791885Z     method(*args, **kwargs)
2025-12-04T10:11:57.9792187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9792307Z     method(*args, **kwargs)
2025-12-04T10:11:57.9792639Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9792700Z     with policy():
2025-12-04T10:11:57.9792998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9793101Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9793908Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9793911Z 
2025-12-04T10:11:57.9794046Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9794566Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9794571Z 
2025-12-04T10:11:57.9794729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9794858Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9794952Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9795332Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9795460Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9795518Z graph_break []
2025-12-04T10:11:57.9795646Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9796340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9796417Z   if out == self.unknown_value:
2025-12-04T10:11:57.9796539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9796630Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9796758Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9797104Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9797167Z graph_break []
2025-12-04T10:11:57.9797250Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9797540Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9797619Z Traceback (most recent call last):
2025-12-04T10:11:57.9797920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9797983Z     method(*args, **kwargs)
2025-12-04T10:11:57.9798282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9798345Z     method(*args, **kwargs)
2025-12-04T10:11:57.9798644Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9798702Z     with policy():
2025-12-04T10:11:57.9798996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9799066Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9799957Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9799993Z 
2025-12-04T10:11:57.9800163Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9800682Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9800686Z 
2025-12-04T10:11:57.9800844Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9800970Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9801060Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9801414Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9801540Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9801597Z graph_break []
2025-12-04T10:11:57.9801724Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9802451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9802527Z   if out == self.unknown_value:
2025-12-04T10:11:57.9802649Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9802738Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9802863Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9803203Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9803264Z graph_break []
2025-12-04T10:11:57.9803391Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9803480Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9803608Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9803953Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9804013Z graph_break []
2025-12-04T10:11:57.9804502Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml -
2025-12-04T10:11:57.9804604Z =========================== short test summary info ============================
2025-12-04T10:11:57.9805888Z FAILED [0.4068s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9805894Z 
2025-12-04T10:11:57.9806018Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9806537Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9806541Z 
2025-12-04T10:11:57.9806734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9806876Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9806995Z ================== 1 failed, 57 deselected, 2 rerun in 11.86s ==================
2025-12-04T10:11:57.9807088Z Got exit code 1
2025-12-04T10:11:57.9807154Z Retrying single test...
2025-12-04T10:11:57.9807436Z W1204 10:11:37.942000 66747 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9807829Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml
2025-12-04T10:11:57.9807928Z ============================= test session starts ==============================
2025-12-04T10:11:57.9808137Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9808204Z cachedir: .pytest_cache
2025-12-04T10:11:57.9808515Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9808592Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9808660Z configfile: pytest.ini
2025-12-04T10:11:57.9808977Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9809109Z collecting ... collected 58 items / 57 deselected / 1 selected
2025-12-04T10:11:57.9809716Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9809787Z Running 1 items in this shard
2025-12-04T10:11:57.9809791Z 
2025-12-04T10:11:57.9810528Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:39.203827166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9810533Z 
2025-12-04T10:11:57.9810829Z [W1204 10:11:48.340366985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9810834Z 
2025-12-04T10:11:57.9811128Z [W1204 10:11:48.340622279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9811132Z 
2025-12-04T10:11:57.9811425Z [W1204 10:11:48.346423177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9811428Z 
2025-12-04T10:11:57.9811717Z [W1204 10:11:48.347017087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9811724Z 
2025-12-04T10:11:57.9812014Z [W1204 10:11:48.347198401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9812019Z 
2025-12-04T10:11:57.9812307Z [W1204 10:11:48.352685324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9812312Z 
2025-12-04T10:11:57.9812602Z [W1204 10:11:48.353223543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9812605Z 
2025-12-04T10:11:57.9812896Z [W1204 10:11:48.353379635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9812899Z 
2025-12-04T10:11:57.9812985Z ('RERUN', {'yellow': True}) [11.0243s] [100%]
2025-12-04T10:11:57.9813741Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:49.344970273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9813777Z 
2025-12-04T10:11:57.9814073Z [W1204 10:11:49.345546802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9814109Z 
2025-12-04T10:11:57.9814397Z [W1204 10:11:49.345691775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9814400Z 
2025-12-04T10:11:57.9814694Z [W1204 10:11:49.348609944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9814697Z 
2025-12-04T10:11:57.9814997Z [W1204 10:11:49.349168244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9815000Z 
2025-12-04T10:11:57.9815293Z [W1204 10:11:49.349308886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9815298Z 
2025-12-04T10:11:57.9815591Z [W1204 10:11:49.353808342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9815596Z 
2025-12-04T10:11:57.9815882Z [W1204 10:11:49.354278690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9815885Z 
2025-12-04T10:11:57.9816228Z [W1204 10:11:49.354416472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9816232Z 
2025-12-04T10:11:57.9816312Z ('RERUN', {'yellow': True}) [0.4116s] [100%]
2025-12-04T10:11:57.9817185Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:11:49.753824159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9817191Z 
2025-12-04T10:11:57.9817487Z [W1204 10:11:49.754391518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9817492Z 
2025-12-04T10:11:57.9817787Z [W1204 10:11:49.754536861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9817790Z 
2025-12-04T10:11:57.9818082Z [W1204 10:11:49.757472030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9818085Z 
2025-12-04T10:11:57.9818378Z [W1204 10:11:49.758032250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9818385Z 
2025-12-04T10:11:57.9818675Z [W1204 10:11:49.758172162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9818681Z 
2025-12-04T10:11:57.9818972Z [W1204 10:11:49.762670949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9818977Z 
2025-12-04T10:11:57.9819275Z [W1204 10:11:49.763136126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9819277Z 
2025-12-04T10:11:57.9819565Z [W1204 10:11:49.763274289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:11:57.9819568Z 
2025-12-04T10:11:57.9819631Z FAILED [0.4088s] [100%]
2025-12-04T10:11:57.9819634Z 
2025-12-04T10:11:57.9819722Z ==================================== RERUNS ====================================
2025-12-04T10:11:57.9820015Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9820206Z Traceback (most recent call last):
2025-12-04T10:11:57.9820521Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9820597Z     method(*args, **kwargs)
2025-12-04T10:11:57.9820948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9821012Z     method(*args, **kwargs)
2025-12-04T10:11:57.9821311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9821371Z     with policy():
2025-12-04T10:11:57.9821670Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9821737Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9822537Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 230686720 and is now 274726912.
2025-12-04T10:11:57.9822543Z 
2025-12-04T10:11:57.9822676Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9823246Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9823250Z 
2025-12-04T10:11:57.9823418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9823548Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9823641Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9823996Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9824129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9824191Z graph_break []
2025-12-04T10:11:57.9824316Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9825010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9825086Z   if out == self.unknown_value:
2025-12-04T10:11:57.9825388Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9825467Z Traceback (most recent call last):
2025-12-04T10:11:57.9825769Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9825836Z     method(*args, **kwargs)
2025-12-04T10:11:57.9826131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9826194Z     method(*args, **kwargs)
2025-12-04T10:11:57.9826488Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9826550Z     with policy():
2025-12-04T10:11:57.9826847Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9826915Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9827756Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 274726912 and is now 276824064.
2025-12-04T10:11:57.9827793Z 
2025-12-04T10:11:57.9827920Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9828453Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9828491Z 
2025-12-04T10:11:57.9828650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9828784Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9828879Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9829226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9829361Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9829422Z graph_break []
2025-12-04T10:11:57.9829549Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9830241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9830311Z   if out == self.unknown_value:
2025-12-04T10:11:57.9830539Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9830631Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9830761Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9831105Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9831172Z graph_break []
2025-12-04T10:11:57.9831261Z =================================== FAILURES ===================================
2025-12-04T10:11:57.9831552Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T10:11:57.9831627Z Traceback (most recent call last):
2025-12-04T10:11:57.9831930Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9832004Z     method(*args, **kwargs)
2025-12-04T10:11:57.9832310Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T10:11:57.9832375Z     method(*args, **kwargs)
2025-12-04T10:11:57.9832664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T10:11:57.9832728Z     with policy():
2025-12-04T10:11:57.9833025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T10:11:57.9833096Z     raise RuntimeError(msg)
2025-12-04T10:11:57.9833900Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9833905Z 
2025-12-04T10:11:57.9834030Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9834554Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9834557Z 
2025-12-04T10:11:57.9834717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9834917Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9835008Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9835349Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9835512Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9835573Z graph_break []
2025-12-04T10:11:57.9835702Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:11:57.9836390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:11:57.9836461Z   if out == self.unknown_value:
2025-12-04T10:11:57.9836588Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9836679Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9836806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9837148Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9837219Z graph_break []
2025-12-04T10:11:57.9837384Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:11:57.9837477Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T10:11:57.9837599Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)]
2025-12-04T10:11:57.9837941Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)]
2025-12-04T10:11:57.9838000Z graph_break []
2025-12-04T10:11:57.9838500Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml -
2025-12-04T10:11:57.9838601Z =========================== short test summary info ============================
2025-12-04T10:11:57.9839928Z FAILED [0.4088s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 276824064 and is now 278921216.
2025-12-04T10:11:57.9839937Z 
2025-12-04T10:11:57.9840064Z To execute this test, run the following from the base repo dir:
2025-12-04T10:11:57.9840585Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9840594Z 
2025-12-04T10:11:57.9840751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:11:57.9840860Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:11:57.9840978Z ================== 1 failed, 57 deselected, 2 rerun in 11.87s ==================
2025-12-04T10:11:57.9841038Z Got exit code 1
2025-12-04T10:11:57.9841510Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T10:11:57.9841756Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:11:57.9842058Z W1204 10:11:56.323000 66933 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:11:57.9842481Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml
2025-12-04T10:11:57.9842629Z ============================= test session starts ==============================
2025-12-04T10:11:57.9842840Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:11:57.9842910Z cachedir: .pytest_cache
2025-12-04T10:11:57.9843221Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:11:57.9843298Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:11:57.9843364Z configfile: pytest.ini
2025-12-04T10:11:57.9843679Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:11:57.9843812Z collecting ... collected 58 items / 58 deselected / 0 selected
2025-12-04T10:11:57.9843912Z stepcurrent: skipping 58 already run items.
2025-12-04T10:11:57.9843984Z Running 0 items in this shard
2025-12-04T10:11:57.9843988Z 
2025-12-04T10:11:57.9844473Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml -
2025-12-04T10:11:57.9844572Z ============================ 58 deselected in 0.01s ============================
2025-12-04T10:11:57.9870297Z The following tests failed consistently: ['test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16']
2025-12-04T10:11:57.9870459Z 
2025-12-04T10:11:57.9870839Z FINISHED PRINTING LOG FILE of inductor/test_cuda_select_algorithm 1/1 (test/test-reports/inductor.test_cuda_select_algorithm_1.1_c5144f504c6801ae_.log)
2025-12-04T10:11:57.9870843Z 
2025-12-04T10:11:57.9871078Z Finished inductor/test_cuda_select_algorithm 1/1 ... [2025-12-04 10:11:57.382849][4757.31113764], took 45.35min
2025-12-04T10:11:57.9871599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml
2025-12-04T10:11:57.9872147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml
2025-12-04T10:11:57.9872691Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml
2025-12-04T10:11:57.9873195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml
2025-12-04T10:11:57.9873753Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml
2025-12-04T10:11:57.9874257Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml
2025-12-04T10:11:57.9874762Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml
2025-12-04T10:11:57.9875275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml
2025-12-04T10:11:57.9875782Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml
2025-12-04T10:11:57.9876321Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml
2025-12-04T10:11:57.9876829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml
2025-12-04T10:11:57.9877340Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml
2025-12-04T10:11:57.9877847Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml
2025-12-04T10:11:57.9878360Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml
2025-12-04T10:11:57.9878889Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml
2025-12-04T10:11:57.9879392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml
2025-12-04T10:11:57.9879940Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml
2025-12-04T10:11:57.9907058Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml
2025-12-04T10:11:58.0236679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml
2025-12-04T10:11:58.0575123Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml
2025-12-04T10:11:58.0915647Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml
2025-12-04T10:11:58.1300515Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml
2025-12-04T10:11:58.1657732Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml
2025-12-04T10:11:58.1998949Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml
2025-12-04T10:11:58.2305748Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml
2025-12-04T10:11:58.2628483Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml
2025-12-04T10:11:58.3015114Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml
2025-12-04T10:11:58.3404979Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml
2025-12-04T10:11:58.3976957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml
2025-12-04T10:11:58.4318353Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml
2025-12-04T10:11:58.4627166Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml
2025-12-04T10:11:58.4946339Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml
2025-12-04T10:11:58.5277539Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml
2025-12-04T10:11:58.5641351Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml
2025-12-04T10:11:58.5947831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml
2025-12-04T10:11:58.6267381Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml
2025-12-04T10:11:58.6557596Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml
2025-12-04T10:11:58.6889990Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml
2025-12-04T10:11:58.7177716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml
2025-12-04T10:11:58.7482437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml
2025-12-04T10:11:58.7787926Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml
2025-12-04T10:11:58.8116239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml
2025-12-04T10:11:58.8475298Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml
2025-12-04T10:11:58.8844677Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml
2025-12-04T10:11:58.9195603Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml
2025-12-04T10:11:58.9496721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml
2025-12-04T10:11:58.9823888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml
2025-12-04T10:11:59.0111098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml
2025-12-04T10:11:59.0433504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml
2025-12-04T10:11:59.0876396Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml
2025-12-04T10:11:59.1167903Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml
2025-12-04T10:11:59.1464249Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml
2025-12-04T10:11:59.1806183Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml
2025-12-04T10:11:59.2108093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml
2025-12-04T10:11:59.2443642Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml
2025-12-04T10:11:59.2767929Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml
2025-12-04T10:11:59.3107721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml
2025-12-04T10:11:59.3497175Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml
2025-12-04T10:11:59.3878388Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml
2025-12-04T10:11:59.4338244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml
2025-12-04T10:11:59.4636836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml
2025-12-04T10:11:59.4967254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml
2025-12-04T10:11:59.5265434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml
2025-12-04T10:11:59.5595504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml
2025-12-04T10:11:59.5884389Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml
2025-12-04T10:11:59.6222159Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml
2025-12-04T10:11:59.6616744Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml
2025-12-04T10:11:59.7007553Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml
2025-12-04T10:11:59.7304445Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml
2025-12-04T10:11:59.7716066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml
2025-12-04T10:11:59.8007902Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml
2025-12-04T10:11:59.8573022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml
2025-12-04T10:11:59.8855877Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml
2025-12-04T10:11:59.9164012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml
2025-12-04T10:11:59.9495251Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml
2025-12-04T10:11:59.9793719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml
2025-12-04T10:12:00.0137938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml
2025-12-04T10:12:00.0462883Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml
2025-12-04T10:12:00.0807260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml
2025-12-04T10:12:00.1145749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml
2025-12-04T10:12:00.1493858Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml
2025-12-04T10:12:00.1865939Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml
2025-12-04T10:12:00.2143193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml
2025-12-04T10:12:00.2456463Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml
2025-12-04T10:12:00.2784037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml
2025-12-04T10:12:00.3096073Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml
2025-12-04T10:12:00.3423693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml
2025-12-04T10:12:00.3738917Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml
2025-12-04T10:12:00.4078751Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml
2025-12-04T10:12:00.4384638Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml
2025-12-04T10:12:00.4727371Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml
2025-12-04T10:12:00.5054836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml
2025-12-04T10:12:00.5358976Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml
2025-12-04T10:12:00.5713927Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml
2025-12-04T10:12:00.6046935Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml
2025-12-04T10:12:00.6738713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml
2025-12-04T10:12:00.7054421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml
2025-12-04T10:12:00.7378946Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml
2025-12-04T10:12:00.7687662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml
2025-12-04T10:12:00.8036414Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml
2025-12-04T10:12:00.8405693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml
2025-12-04T10:12:00.9690494Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml
2025-12-04T10:12:01.0019287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml
2025-12-04T10:12:01.0323523Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml
2025-12-04T10:12:01.0639375Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml
2025-12-04T10:12:01.0984139Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml
2025-12-04T10:12:01.1345592Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml
2025-12-04T10:12:01.1836395Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml
2025-12-04T10:12:01.2156634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml
2025-12-04T10:12:01.2493738Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml
2025-12-04T10:12:01.2807861Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml
2025-12-04T10:12:01.3172679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml
2025-12-04T10:12:01.3437935Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml
2025-12-04T10:12:01.3738085Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml
2025-12-04T10:12:01.4038693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml
2025-12-04T10:12:01.4425739Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml
2025-12-04T10:12:01.4776844Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml
2025-12-04T10:12:01.5165040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml
2025-12-04T10:12:01.5408913Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml
2025-12-04T10:12:01.5697831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml
2025-12-04T10:12:01.5982470Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml
2025-12-04T10:12:01.6323146Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml
2025-12-04T10:12:01.7127498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml
2025-12-04T10:12:01.7471370Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml
2025-12-04T10:12:01.7746967Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml
2025-12-04T10:12:01.8038307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml
2025-12-04T10:12:01.8335709Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml
2025-12-04T10:12:01.8754567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml
2025-12-04T10:12:01.9026055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml
2025-12-04T10:12:01.9299812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml
2025-12-04T10:12:01.9755413Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml
2025-12-04T10:12:02.0046040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml
2025-12-04T10:12:02.0346508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml
2025-12-04T10:12:02.0613636Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml
2025-12-04T10:12:02.0968694Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml
2025-12-04T10:12:02.1259302Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml
2025-12-04T10:12:02.1555039Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml
2025-12-04T10:12:02.1876206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml
2025-12-04T10:12:02.2146122Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml
2025-12-04T10:12:02.2437490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml
2025-12-04T10:12:02.2803447Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml
2025-12-04T10:12:02.3097512Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml
2025-12-04T10:12:02.3398942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml
2025-12-04T10:12:02.3724237Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml
2025-12-04T10:12:02.4024848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml
2025-12-04T10:12:02.4455767Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml
2025-12-04T10:12:02.4798529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml
2025-12-04T10:12:02.5133617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml
2025-12-04T10:12:02.5466743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml
2025-12-04T10:12:02.5758476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml
2025-12-04T10:12:02.6016432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml
2025-12-04T10:12:02.6266093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml
2025-12-04T10:12:02.6536263Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml
2025-12-04T10:12:02.6925953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml
2025-12-04T10:12:02.7226502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml
2025-12-04T10:12:02.7499168Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml
2025-12-04T10:12:02.7784100Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml
2025-12-04T10:12:02.8187263Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml
2025-12-04T10:12:02.8495968Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml
2025-12-04T10:12:02.8791878Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml
2025-12-04T10:12:02.9079251Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml
2025-12-04T10:12:02.9418127Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml
2025-12-04T10:12:02.9718019Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml
2025-12-04T10:12:03.0018094Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml
2025-12-04T10:12:03.0329428Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml
2025-12-04T10:12:03.0623176Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml
2025-12-04T10:12:03.0906624Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml
2025-12-04T10:12:03.1217726Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml
2025-12-04T10:12:03.1495407Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml
2025-12-04T10:12:03.1819326Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml
2025-12-04T10:12:03.2127587Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml
2025-12-04T10:12:03.2379096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml
2025-12-04T10:12:03.2685598Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml
2025-12-04T10:12:03.2984497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml
2025-12-04T10:12:03.3279807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml
2025-12-04T10:12:03.6466180Z Uploading logs for 57116084862 to S3
2025-12-04T10:12:03.7671762Z Uploading artifacts took 0.42 seconds
2025-12-04T10:12:03.7672132Z inductor/test_cuda_select_algorithm 1/1 failed!
2025-12-04T10:12:03.7675596Z Running inductor/test_compile_subprocess 2/2 ... [2025-12-04 10:12:03.767352][4763.695648876]
2025-12-04T10:12:03.7676089Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:12:03.7679456Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_subprocess.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:12:03.767694]
2025-12-04T10:18:48.8724179Z 
2025-12-04T10:18:48.8725459Z inductor/test_compile_subprocess 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_subprocess_2.2_b00c67e905398654_.log
2025-12-04T10:18:48.8932054Z Running 464 items in this shard: test/inductor/test_compile_subprocess.py::TestSubprocess::test_async, test/inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_pack_4bit_weight_bf16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test__unsafe_masked_index_put_accumulate_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool2d_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex9_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex_strided_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_const_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_const_int_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aliased_buffer_reuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_allow_reuse_active_if_under_peak_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_any_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_cache_hit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_dtype_device_layout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_support_str_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_with_persistent_cache_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_duplicates_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin_with_nan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_min_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_as_strided_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_as_strided_on_views_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_alignment_op_name_pass_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_size_stride_op_name_fail_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_async, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool_errors_with_uint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_batch_norm_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bfloat16_to_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bitwise3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bmm2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_add_autotune_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_default_kwargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_nd_tiling_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_batch_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_float_ndigits_pos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_empty_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_extern_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_of_loops_and_extern_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_unbacked_empty_1d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_upcasting_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cauchy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_check_stack_no_cycles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_type_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clone_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_computed_buffer_inlining_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_concat_add_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_cudagraphs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_recompiles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumprod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_3d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_nd_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv1d_depthwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv2d_backward_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv2d_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv3d_channels_last_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_functional_bn_fuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_scalar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cpp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_gpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_tensor_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cudnn_rnn_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cummin_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumprod_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_no_mask_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_compiled_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_would_split_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_deterministic_codegen_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_deterministic_codegen_on_graph_break_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_device_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dist_bf16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dist_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_deterministic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dropout_trivial_0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_bag_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_sparse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_emulate_precision_triton_fp_fusion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_erfinv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expanded_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_expm1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_basic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_list_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_list_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_no_mutated_tensors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_with_return_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fft_real_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fill1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float16_to_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float32_to_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float_index_expression_type_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmin_fmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmod_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_sliced_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_transposed_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_truncation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fuse_large_params_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fuse_tiled_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fusing_write_into_disjoint_read_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gather3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generate_rand_fp8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generated_code_has_alignment_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generated_code_has_size_stride_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_glu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gpu_scalar_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gpu_scalar_with_gpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_arange1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_argmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_both_scalars_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_no_inputs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_pad_dynamic_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_refcount_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_unbacked_symint_as_output_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_expand_preserves_view_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardsigmoid_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardswish_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_horizonal_fusion2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_float_zero_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_abs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_device_assert_masked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_flip_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_floordiv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_nested_indirect_indexing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_select_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_layout_optimization_input_mutations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_multiple_specializations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_triton_bucketize_respects_masking_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inner_fn_str_and_stride_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_activations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_mixed_dtype_ops_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_where_pointwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_int_input_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_invalid_operand_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isinf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_pointwise_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_strided_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_leaky_relu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lerp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lgamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands_sliced_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_dynamic_maxautotune_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linspace1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linspace3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_dynamic_shape_assertion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_mode_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_flex_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_invoke_subgraph_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_regional_compile_repeated_blocks_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log1p_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log_softmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_long_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mark_dynamic_with_hint_override_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mark_unbacked_with_hint_override_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_matmul_layer_norm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_min_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d6_dilation_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_min_max_reduction_nan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mm_mixed_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mm_views_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_any_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_sum_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_lowp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutations_loop_fusion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_assert_inside_triton_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_False_descending_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_to_num_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_narrow_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_neg_max_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nll_loss_forward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_no_specization_over_symbolic_value_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nonzero_unbacked_refinement_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_output_strides_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pad_cast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pad_view_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_multi_user_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_permute2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pixel_shuffle_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_airy_ai_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_v_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_w_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfcx_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_gammaincc_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_hermite_polynomial_he_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i0e_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i1e_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_log_ndtr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_logit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_multigammaln_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_ndtr_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_psi_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_t_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_u_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_v_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_w_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_polar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_by_natural_log2_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_prepare_softmax_with_fast_math_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_progressive, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_distribution_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_int64_mod_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_kernel_count_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_like_empty_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_with_dtype_and_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remainder_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_clone_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_copy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_decomposition_has_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_replication_pad_errors_with_bool_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_require_stride_expanded_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_resize_as_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_resize_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roi_align_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_roll_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_round_correctness_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_cpu_tensor_arg_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_output_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scaled_dot_product_efficient_attention_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_add2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_add3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_reduce2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scheduler_vertical_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_unaligned_mask_freezing_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_searchsorted_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_shape_prop_torch_ones_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_should_pad_bench_for_bmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sigmoid_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sign_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_signbit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_simplify_loops_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sin_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_single_elem_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_single_elem_indirect_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sizehint_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_softmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_softmax_one_kernel_persist_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumprod_low_prec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_dynamic_shape_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_with_int64_size_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_integer_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_unbacked_symints_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sqrt_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_squeeze_varargs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_stack_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tan_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor_index_slice_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_to_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_torch_device_split_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_transpose_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_uint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbind_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unfold_zero_dimension_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unsigned_constant_tensors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unsqueeze_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_bicubic2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_bilinear2d_b_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest3d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_div_by_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_var_mean_tile_reduction_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vectorized_ops_masked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vectorized_ops_masked_var_novec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vertical_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_on_aliased_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_uint8_through_differing_bitwidths_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_xblock_divides_xnumel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_zeros_cuda
2025-12-04T10:18:48.9128757Z 
2025-12-04T10:18:48.9129247Z Finished inductor/test_compile_subprocess 2/2 ... [2025-12-04 10:18:48.876683][5168.804975674], took 6.75min
2025-12-04T10:18:48.9130816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-2cb94ab29d3c0df8.xml
2025-12-04T10:18:49.0325917Z Running test_decomp 1/22 ... [2025-12-04 10:18:49.032356][5168.96065298]
2025-12-04T10:18:49.0326335Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:18:49.0329180Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=1', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:18:49.032659]
2025-12-04T10:25:12.1067611Z 
2025-12-04T10:25:12.1069065Z test_decomp 1/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_1.22_14e10bdd16255327_.log
2025-12-04T10:25:12.1245988Z Running 435 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logaddexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_uint8
2025-12-04T10:25:12.1422798Z 
2025-12-04T10:25:12.1423134Z Finished test_decomp 1/22 ... [2025-12-04 10:25:12.107179][5552.035472922], took 6.38min
2025-12-04T10:25:12.1424299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-ab5aa4d4069f84fb.xml
2025-12-04T10:25:12.2404057Z Running test_decomp 5/22 ... [2025-12-04 10:25:12.240137][5552.168436024]
2025-12-04T10:25:12.2404642Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:25:12.2407176Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=5', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:25:12.240446]
2025-12-04T10:32:20.1281690Z 
2025-12-04T10:32:20.1283742Z test_decomp 5/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_5.22_0df19fb5b56a60f0_.log
2025-12-04T10:32:20.1375510Z Running 411 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_igamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_instance_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_squeeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_igammac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_train_mode_cuda_float64, test/test_decomp.py::DecompOneOffTestsCUDA::test_amp_batch_norm_backward_cuda
2025-12-04T10:32:20.1462471Z 
2025-12-04T10:32:20.1462656Z Finished test_decomp 5/22 ... [2025-12-04 10:32:20.128755][5980.057047772], took 7.13min
2025-12-04T10:32:20.1504733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-35f77a6004c12574.xml
2025-12-04T10:32:20.6199910Z Uploading artifacts took 0.39 seconds
2025-12-04T10:32:20.6203502Z Running test_decomp 10/22 ... [2025-12-04 10:32:20.620138][5980.548434446]
2025-12-04T10:32:20.6203925Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:32:20.6207236Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=10', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:32:20.620467]
2025-12-04T10:37:04.1376573Z 
2025-12-04T10:37:04.1378754Z test_decomp 10/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_10.22_12f8ad2c135e2bbb_.log
2025-12-04T10:37:04.1468126Z Running 393 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_eval_mode_cuda_float64
2025-12-04T10:37:04.1552824Z 
2025-12-04T10:37:04.1553017Z Finished test_decomp 10/22 ... [2025-12-04 10:37:04.138274][6264.066567969], took 4.73min
2025-12-04T10:37:04.1600195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-4e86df63b0f31fa2.xml
2025-12-04T10:37:04.2516678Z Running test_decomp 15/22 ... [2025-12-04 10:37:04.251441][6264.179739275]
2025-12-04T10:37:04.2517419Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:37:04.2520213Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=15', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:37:04.251741]
2025-12-04T10:44:51.0558308Z 
2025-12-04T10:44:51.0559611Z test_decomp 15/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_15.22_3612d41b87c57e18_.log
2025-12-04T10:44:51.0655756Z Running 431 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_eval_mode_cuda_float32, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float16
2025-12-04T10:44:51.0748451Z 
2025-12-04T10:44:51.0748637Z Finished test_decomp 15/22 ... [2025-12-04 10:44:51.056364][6730.984655191], took 7.78min
2025-12-04T10:44:51.0779059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-421be54fd2475226.xml
2025-12-04T10:44:51.1743507Z Running test_decomp 20/22 ... [2025-12-04 10:44:51.174097][6731.102395458]
2025-12-04T10:44:51.1743902Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:44:51.1746669Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=20', '--num-shards=22', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:44:51.174395]
2025-12-04T10:50:29.4905709Z 
2025-12-04T10:50:29.4907585Z test_decomp 20/22 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_20.22_0285151ec6a3cff1_.log
2025-12-04T10:50:29.5000820Z Running 411 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gcd_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_transpose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_train_mode_cuda_float32, test/test_decomp.py::HasDecompTest::test_has_decomposition
2025-12-04T10:50:29.5087221Z 
2025-12-04T10:50:29.5087410Z Finished test_decomp 20/22 ... [2025-12-04 10:50:29.491247][7069.419542076], took 5.64min
2025-12-04T10:50:29.5132009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-269f711d29febed7.xml
2025-12-04T10:50:29.6245436Z Running test_ci_sanity_check_fail 1/1 ... [2025-12-04 10:50:29.624292][7069.552590124]
2025-12-04T10:50:29.6246049Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:50:29.6248578Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ci_sanity_check_fail.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:50:29.624596]
2025-12-04T10:50:42.1243384Z Finished test_ci_sanity_check_fail 1/1 ... [2025-12-04 10:50:42.123913][7082.052205858], took 0.21min
2025-12-04T10:50:42.1458012Z Running test_ops 3/9 ... [2025-12-04 10:50:42.145565][7082.073863143]
2025-12-04T10:50:42.1458543Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:50:42.1461654Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=3', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:50:42.145923]
2025-12-04T11:12:57.0030510Z 
2025-12-04T11:12:57.0031236Z test_ops 3/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_3.9_cd041df637bc19f1_.log
2025-12-04T11:12:57.0899519Z Running 3669 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_H_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_polar_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mH_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardswish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_where_cuda, test/test_ops.py::TestCommonCUDA::test_errors_amin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_item_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_errors_t_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_tril_cuda, test/test_ops.py::TestCommonCUDA::test_errors_where_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rmul___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_channel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_cosine_embedding_loss_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_put_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize_as__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_list_args_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tile_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_out_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestCommonCUDA::test_out_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diff_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rms_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softsign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sparse_mm_reduce_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_where_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_out_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_istft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_polar_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_item_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_randn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_permuted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vsplit_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view__chunk_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_alias_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flip_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_item_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ne_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_positive_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_any_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_combinations_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_unary_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cond_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pca_lowrank_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unsafe_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_equal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_istft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_renorm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_multiple_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_alias_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gradient_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_unary_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_householder_product_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mH_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_normal_in_place_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rand_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_list_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_multiple_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tile_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view_T_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_geometric_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_roll_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__segment_reduce_offsets_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_any_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_inverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_float_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log10_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nan_to_num_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_linear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_selu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_svd_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trunc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsafe_chunk_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__chunk_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_in_place_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int8, test/test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__chunk_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_char_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_alias_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_copysign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_count_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_float_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_xor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_erfcx_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_stft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bincount_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_corrcoef_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gradient_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_fro_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scalar_tensor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_t_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensordot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_like_cuda_float32
2025-12-04T11:12:57.1746644Z 
2025-12-04T11:12:57.1746839Z Finished test_ops 3/9 ... [2025-12-04 11:12:57.006945][8416.935238594], took 22.25min
2025-12-04T11:12:57.1747485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-f35e359ea3f52347.xml
2025-12-04T11:12:57.6745956Z Uploading artifacts took 0.48 seconds
2025-12-04T11:12:57.6748264Z Running test_ops 8/9 ... [2025-12-04 11:12:57.674612][8417.602908108]
2025-12-04T11:12:57.6748657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:12:57.6751815Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=8', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:12:57.674936]
2025-12-04T11:33:20.9917703Z 
2025-12-04T11:33:20.9918436Z test_ops 8/9 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_8.9_44eee4e8e92c270e_.log
2025-12-04T11:33:21.0786183Z Running 3704 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing___getitem___cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsafe_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__chunk_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_multiple_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_with_logits_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_logsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nonzero_static_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_squeeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_torch_ops_aten__flash_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsafe_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_errors___ror___cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_normal_in_place_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout0_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_errors_trace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hash_tensor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_return_by_ref_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_reduce_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zero__cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_permuted_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_in_place_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsafe_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__unsafe_masked_index_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cov_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_multi_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mT_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_channel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest-exact_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_kl_div_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_nuc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_permute_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rand_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_roll_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cauchy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal__in_place_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal__in_place_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_hash_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geometric_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_permute_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_geometric_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_zeros_like_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isreal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logaddexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_multiple_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cov_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_einsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expm1_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_hstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ldexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_unpack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_interleave_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rpow___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_any_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expm1_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_masked_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_square_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagflat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_triangular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logaddexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_outer_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sgn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_contiguous_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flipud_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_erfcx_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_alias_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_allclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kron_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_slogdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_constant_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_rms_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softplus_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ormqr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pca_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_interleave_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_exponential_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_t_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tensor_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unique_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geometric_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hash_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__unsafe_masked_index_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_item_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_in_place_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_multiple_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_uint8, test/test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rsub___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_block_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ravel_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_signbit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_entr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_angle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_asin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanquantile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_to_sparse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestTagsCUDA::test_tags_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_consecutive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unsqueeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_floor_rounding_cuda_float32
2025-12-04T11:33:21.1629192Z 
2025-12-04T11:33:21.1629396Z Finished test_ops 8/9 ... [2025-12-04 11:33:20.996037][9640.924329935], took 20.39min
2025-12-04T11:33:21.1630025Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-7327fc5de50caef8.xml
2025-12-04T11:33:21.6231928Z Uploading artifacts took 0.45 seconds
2025-12-04T11:33:21.6235846Z Running inductor/test_torchinductor_dynamic_shapes 4/4 ... [2025-12-04 11:33:21.623360][9641.551655226]
2025-12-04T11:33:21.6236357Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:33:21.6240204Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:33:21.623726]
2025-12-04T11:43:07.7900340Z 
2025-12-04T11:43:07.7901390Z inductor/test_torchinductor_dynamic_shapes 4/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_4.4_d7a417aa701cd416_.log
2025-12-04T11:43:07.8087282Z Running 518 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_matmul_4bit_bf16_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adding_tensor_offsets_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_override_registration_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_batch_norm_2d_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bfloat16_to_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bmm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_both_scalars_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_add_autotune_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cauchy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_concat_add_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_config_option_dont_assume_alignment_cudagraphs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_const_int32_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv1d_depthwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv2d_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_functional_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_inference_heuristics_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumprod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_by_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_presicion_accuracy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_prim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_trivial_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_elu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_empty2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_exact_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_expand_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_expm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fft_real_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fft_real_input_real_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fill1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flip_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float16_to_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_repr_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_full_like_transposed_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fuse_large_params_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_generate_rand_fp8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_getitem_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_glu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_arange2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_grid_sampler_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_float_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_indirect_load_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inf_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inner_reduction_detection_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_where_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_insignificant_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_isinf2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_kernel_names_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_strided_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_list_clearing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mark_dynamic_with_hint_override_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_masked_fill_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_matmul_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mix_device_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mul_index_expr_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multi_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_assert_inside_triton_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nan_to_num_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_narrow_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_ones_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_no_op_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_no_specization_over_symbolic_value_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_output_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pad_view_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_permute1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pixel_shuffle_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_chebyshev_polynomial_u_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_expm1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i0e_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1e_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_psi_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_spherical_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_by_natural_log2_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randint_int64_mod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_generator_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_relu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_view_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_roi_align_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rsqrt_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_bf16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_unaligned_mask_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_select_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_silu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_simplify_loops_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sizehint_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_view_with_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_one_kernel_loop_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_one_kernel_persist_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_reduction_dynamic_shape_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor_index_put_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tmp_not_defined_issue2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_to_memory_format_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_triu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unbacked_float_item_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unroll_small_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_as_complex_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_as_real_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_bwd_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_where_with_logical_op_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_zero_element_mutation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex_strided_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_inplace_permuted_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adding_tensor_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_alexnet_prefix_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aliased_buffer_reuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_with_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin_with_nan_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_as_strided_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_alignment_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_size_stride_op_name_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool_errors_with_uint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_baddbmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_batch_norm_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bmm2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_default_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_nd_tiling_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_int_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_compar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_from_real_imag_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_config_option_dont_assume_alignment_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_consecutive_split_cumprod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_consecutive_split_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_const_int32_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv1d_depthwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv1d_with_permute_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv2d_backward_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv3d_channels_last_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_functional_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_inference_heuristics_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cudnn_rnn_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_default_layout_constraint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_scan_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div9_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_by_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_precision_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtype_mismatch_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_elu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_erfinv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exact_stride_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expand_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expand_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expanded_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_basic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fill1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fill2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_index_expression_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_repr_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fmod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_forced_buffer_realize_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_truncation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_generate_rand_fp8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_glu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_constant_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_unbacked_symint_as_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_grid_sampler_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_failed_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_fallback2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_select_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_indirect_load_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_add_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_isinf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_broadcast_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_leaky_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_mixed_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linspace4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_list_clearing_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_dynamic_shape_assertion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log_fp64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logaddexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logcumsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mark_unbacked_with_hint_override_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mix_device_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_move_arange_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_gpu_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_prime_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_narrow_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_needs_contiguous_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_new_ones_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nll_loss_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nll_loss_forward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_norm_constant_overflow_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_output_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pad_single_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pattern_matcher_unbacked_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_permute1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_digamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_erfc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_erfcx_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_erfinv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_gammaincc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i0e_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_multigammaln_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_round_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_xlog1py_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_zeta_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction_config_limit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_relu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_view_default_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_require_stride_expanded_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_round_correctness_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scalar_cpu_tensor_arg_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scalar_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scaled_dot_product_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_add2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_select_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_shape_prop_torch_ones_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_single_elem_indirect_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_one_kernel_loop_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_stable_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_failed_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_integer_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tanh_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor_index_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tmp_not_defined_issue1_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_device_constant_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_topk_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_transpose_add_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triton_argmin_argmax_transpose_logical_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_cat_conv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_nearest3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_correction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_as_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_on_aliased_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_view_uint8_through_differing_bitwidths_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views7_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_weight_norm_bwd_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_weight_norm_conv2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_where_with_logical_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zero_dim_reductions_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zero_element_mutation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zeros_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_cat_unbacked_duplicate_size_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_constant_fold_uniform_value_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_item_inf_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_item_return_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_return_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_unbacked_stride_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_mark_unbacked_slice_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op5_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op9_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_multi_output_unbacked_custom_op_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_non_persistent_dynamic_rblock_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_noops_tensor_repropagate_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_slice_index_changing_sign_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sym_stride_lowering_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_cat_backwards_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_index_select_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unwrap_storage_didnt_work_repro_cuda
2025-12-04T11:43:07.8263435Z 
2025-12-04T11:43:07.8263741Z Finished inductor/test_torchinductor_dynamic_shapes 4/4 ... [2025-12-04 11:43:07.790900][10227.719191973], took 9.77min
2025-12-04T11:43:07.8264674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-e6d2768dce09d0dd.xml
2025-12-04T11:43:07.9101187Z Running inductor/test_torchinductor_opinfo 2/13 ... [2025-12-04 11:43:07.909876][10227.838174483]
2025-12-04T11:43:07.9101723Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:43:07.9104140Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=2', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:07.910155]
2025-12-04T11:52:05.9037694Z 
2025-12-04T11:52:05.9040055Z inductor/test_torchinductor_opinfo 2/13 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_2.13_c074395000e8f728_.log
2025-12-04T11:52:05.9130210Z Running 263 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rxor___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cauchy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_max_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gcd_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_quantile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1e_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_float64
2025-12-04T11:52:05.9215455Z 
2025-12-04T11:52:05.9215722Z Finished inductor/test_torchinductor_opinfo 2/13 ... [2025-12-04 11:52:05.904431][10765.832724178], took 8.97min
2025-12-04T11:52:05.9267808Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-52326583abfcb307.xml
2025-12-04T11:52:06.0229283Z Running inductor/test_torchinductor_opinfo 7/13 ... [2025-12-04 11:52:06.022698][10765.95099546]
2025-12-04T11:52:06.0229769Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:52:06.0232720Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=7', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:52:06.023001]
2025-12-04T12:00:48.3400756Z 
2025-12-04T12:00:48.3403594Z inductor/test_torchinductor_opinfo 7/13 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_7.13_206d55120439a46b_.log
2025-12-04T12:00:48.3567952Z Running 252 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rand___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__segment_reduce_lengths_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_baddbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bernoulli_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cholesky_inverse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_rank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svd_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svdvals_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_uniform_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unravel_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_int64
2025-12-04T12:00:48.3726791Z 
2025-12-04T12:00:48.3727270Z Finished inductor/test_torchinductor_opinfo 7/13 ... [2025-12-04 12:00:48.341053][11288.269344819], took 8.71min
2025-12-04T12:00:48.3728879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-b0e5cfa73b17bf79.xml
2025-12-04T12:00:48.9296959Z Uploading artifacts took 0.47 seconds
2025-12-04T12:00:48.9301249Z Running inductor/test_torchinductor_opinfo 12/13 ... [2025-12-04 12:00:48.929875][11288.858170883]
2025-12-04T12:00:48.9301942Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:00:48.9305405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=12', '--num-shards=13', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:00:48.930239]
2025-12-04T12:11:53.7857524Z 
2025-12-04T12:11:53.7858949Z inductor/test_torchinductor_opinfo 12/13 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_12.13_b0b968134062f752_.log
2025-12-04T12:11:53.7945947Z Running 259 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_right_shift_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gcd_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lgamma_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eig_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_householder_product_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_static_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ormqr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rand_like_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_float16
2025-12-04T12:11:53.8030811Z 
2025-12-04T12:11:53.8031094Z Finished inductor/test_torchinductor_opinfo 12/13 ... [2025-12-04 12:11:53.786124][11953.714418457], took 11.08min
2025-12-04T12:11:53.8093061Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-44c334397fb0c3bd.xml
2025-12-04T12:11:53.8847650Z Running inductor/test_cuda_repro 1/1 ... [2025-12-04 12:11:53.884448][11953.812743979]
2025-12-04T12:11:53.8848262Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:11:53.8850569Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_repro.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:11:53.884764]
2025-12-04T12:13:14.9904834Z 
2025-12-04T12:13:14.9905869Z inductor/test_cuda_repro 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cuda_repro_1.1_1c12e5d6528c7a17_.log
2025-12-04T12:13:14.9931305Z Running 96 items in this shard: test/inductor/test_cuda_repro.py::CudaReproTests::test_3d_tiling, test/inductor/test_cuda_repro.py::CudaReproTests::test_accuracy_issue1, test/inductor/test_cuda_repro.py::CudaReproTests::test_adaptive_avg_pool3d_issue_157248, test/inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16, test/inductor/test_cuda_repro.py::CudaReproTests::test_autotune_inplace_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_backward_context, test/inductor/test_cuda_repro.py::CudaReproTests::test_bool_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_dynamic_dense, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_epilogue, test/inductor/test_cuda_repro.py::CudaReproTests::test_cat_int8_one_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_cpu_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_deterministic_algorithms, test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses, test/inductor/test_cuda_repro.py::CudaReproTests::test_dtype_factory_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_persistent_reductions, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_to_static_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned, test/inductor/test_cuda_repro.py::CudaReproTests::test_embedding_var_mean, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_mean_ratio_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_min_pow_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_norm_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs_no_size_asserts, test/inductor/test_cuda_repro.py::CudaReproTests::test_flash_attention_dynamic, test/inductor/test_cuda_repro.py::CudaReproTests::test_float64_constants, test/inductor/test_cuda_repro.py::CudaReproTests::test_float8_e8m0fnu, test/inductor/test_cuda_repro.py::CudaReproTests::test_full_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_identity_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_add_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_inplace_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_no_fallback_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_indirect_indexing_dense_mask, test/inductor/test_cuda_repro.py::CudaReproTests::test_inductor_output_aliases_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_add_alpha_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_buffer_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_updates_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_input_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_int64_index_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue100806, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103461, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103481, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue104759, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_1input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_2input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue_103924, test/inductor/test_cuda_repro.py::CudaReproTests::test_libdevice_routing, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_cpu_input, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_with_zero_infeature_size, test/inductor/test_cuda_repro.py::CudaReproTests::test_lookup_seed_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_max_autotune_nograd, test/inductor/test_cuda_repro.py::CudaReproTests::test_memory_history_inductor, test/inductor/test_cuda_repro.py::CudaReproTests::test_mm_out_dtype_compile, test/inductor/test_cuda_repro.py::CudaReproTests::test_multi_output_layout_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_mutated_aligned_tensor, test/inductor/test_cuda_repro.py::CudaReproTests::test_negative_arange_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_no_device_idx_repro_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_commutative_scan_op, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_contiguous_unaligned_input_indices, test/inductor/test_cuda_repro.py::CudaReproTests::test_normalize_norm_leq_one, test/inductor/test_cuda_repro.py::CudaReproTests::test_not_initializing_wrong_device, test/inductor/test_cuda_repro.py::CudaReproTests::test_permute_fusion, test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile, test/inductor/test_cuda_repro.py::CudaReproTests::test_red_dtype_mismatch, test/inductor/test_cuda_repro.py::CudaReproTests::test_reflection_pad_loop_order, test/inductor/test_cuda_repro.py::CudaReproTests::test_repeated_masked_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_scalar_triton_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_scaled_dot_product_efficient_attention_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_scatter_index_not_wrapped, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape0_quantiles_strides0_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape1_quantiles_strides1_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape2_quantiles_strides2_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape3_quantiles_strides3_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape4_quantiles_strides4_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape5_quantiles_strides5_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape6_quantiles_strides6_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape7_quantiles_strides7_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_selecsls42b_misaligned_address, test/inductor/test_cuda_repro.py::CudaReproTests::test_simplify_dims, test/inductor/test_cuda_repro.py::CudaReproTests::test_sort_stride_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_sorted_masks, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_transposed, test/inductor/test_cuda_repro.py::CudaReproTests::test_triton_interpret, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_base_not_bitwise_equivalent, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_emulate_divison_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_uint_view_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_unspec_inputs_interop, test/inductor/test_cuda_repro.py::CudaReproTests::test_unused_cpu_input_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_view_replay_padding_issue_163328, test/inductor/test_cuda_repro.py::CudaReproTests::test_xlnet_lm_stride_repro
2025-12-04T12:13:14.9953941Z 
2025-12-04T12:13:14.9954170Z Finished inductor/test_cuda_repro 1/1 ... [2025-12-04 12:13:14.990367][12034.918655146], took 1.35min
2025-12-04T12:13:15.0134591Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3098e6f6c63481df.xml
2025-12-04T12:13:15.1291233Z Running inductor/test_compiled_autograd 1/2 ... [2025-12-04 12:13:15.128819][12035.057111759]
2025-12-04T12:13:15.1291714Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:13:15.1294467Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_autograd.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:13:15.129170]
2025-12-04T12:20:32.3824219Z 
2025-12-04T12:20:32.3825785Z inductor/test_compiled_autograd 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_autograd_1.2_dcced55d4b6d289b_.log
2025-12-04T12:20:32.3974890Z Running 438 items in this shard: test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_3, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_1_5_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_accumulate_grad_polyfill_case_3_2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_already_nan, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_backward, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_anomaly_mode_grad, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_basic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_id_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_non_traceable, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_dynamic_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_float_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_saved_int_is_traceable_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_backward_hook_relative_ordering_partial, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cache_hit, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_sac, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_False, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_checkpointing_simple_reentrant_True, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_api_optimize_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compile_api_disable_api_compile_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_compiled_autograd_does_not_specialize_on_bw_symints, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cpu_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_graph, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_cpp_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_cpu_scalar_used_in_python_custom_op, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_cudagraphs_sdpa, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_compiled_fw_bw_graph_break, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_dynamically_defined_class, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_multiple_grads, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_attr, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_multiple_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_custom_fn_saved_tensors, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_cpp_reducer_error, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_ddp_python_reducer, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_disk_offloading, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_annotations, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamic_shapes_eager_node, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_dynamo_boxed, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_flex_attention, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_free_activation_memory_subclass, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_higher_order_gradients, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_cpp_extension, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_hipify_not_loaded_with_import_torch, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inplace_grad_update, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_inputs_aliasing_bytecode_stack_restore, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_issue106555, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_keep_graph_usage_after_compiled, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_logging_tensor_flaky, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_aot_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_eager, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_optimize_assert_backend_inductor, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_output_nodes_all_leaves, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reorder_multi_tensor_pre_hooks, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_reset, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_saved_tensor_unpack_hook_ordering, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook1, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_tensor_grad_hook2, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_compile_only_backward_call, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_torch_function_mode, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_trace_run_with_rng_state, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_aot_dispatcher_nodes_hop, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_cpp, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_verbose_logs_snapshot, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_access_saved_tensor_twice_without_recomputation_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_posthooks_should_not_execute, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_accumulate_grad_with_zero_numel_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_assign_parent_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_detect_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_anomaly_mode_no_check_nan, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_view_of_view, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_creation_meta, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_inplace_views_cross_dtype, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_multiple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_simple_views_python, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_views_codegen, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_copy, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_create_graph_warns, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_hook_relative_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_retained_graph_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_twice_with_saved_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_backward_with_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_calculate_shape_util, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_adds_callback, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_cant_create_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_detects_non_determinism, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_graph_execution_group, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpoint_valid_reset_on_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_correct_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_custom_function_works, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_dataparallel, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_checkpointing_without_reentrant_memory_savings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_create_graph_and_full_backward_hook_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_current_graph_task_execution_order, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_ac_early_stop, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_no_early_free, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_autograd_repeated_grad_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_non_tensor_before_tensor_args, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_forward_mode_wrong_formula, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_mark_dirty_not_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_preserve_torch_function_when_return_as_is, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saved_tensors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_saving_mutated_view_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_setup_context_simple, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_custom_function_vmap_defaults, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_deep_reentrant, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dep_nograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_dependent_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_base, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_detach_then_inplace_raises_in_autograd, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_disabling_saved_tensor_hooks_nested, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_duplicate_backward_root, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_enable_grad_decorator_no_paren, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_first_grad_fn_access_in_no_grad_mode, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_complicated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_free_deep_graph_pyfunction, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_get_data_and_hooks_from_raw_saved_variable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_empty_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_badcalls, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_input_metadata, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_fn_prehooks_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_nonleaf_register_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_thread_safety, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_to_node_materialize, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_grad_unreachable_discovery, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_batched_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_check_forward_or_backward_only, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_complex_non_complex_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_custom_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_dense_and_sparse_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_forward_ad_runs_with_no_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout2, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_input_layout4, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_test_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_gradcheck_validates_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_graph_save_on_cpu, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_edge_case_when_called_with_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hook_none, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_hooks_cpp, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_indexing, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_not_requires_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_leaf_errors, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_weak_grad_fn, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_integer_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_legacy_function_deprecation_exception, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_lobpcg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_mark_non_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_materialize_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_multi_backward_no_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_named_tensor_for_complex_views, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_anomaly_access, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_naughty_autograd_function_stashing_ctx, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_nested_anomaly_printstack_cleanup, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_next_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_grad_python_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_requires_grad_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_no_unnecessary_save, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_not_implemented_fwad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pickle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_post_accumulate_grad_hook_returns_not_None, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_pow_zero_tensor_gradient, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_power_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_prehook_ordering, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_aggregation_table, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_function_event_avg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_seq_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_profiler_shapes, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_record_function, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_child_error, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_callbacks_depth_0, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_reentrant_with_leaf_variable_hook, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_requires_grad_, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retain_grad_cycle, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_retains_grad_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_duplicate_inplace, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_return_leaf, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_none_for_backward, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_on_cpu_and_checkpoint, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_save_output_nr, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_custom_function_intermediates, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_tensor_hooks_extra_enter_during_bw_no_leak, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_saved_variable_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_scalar_grad_mixed_device, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_select_expanded_v, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_data_tensorimpl_type, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_coroutines_benign_exceptions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_enabled_wraps, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_grad_generator_functions, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_set_materialize_non_diff_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_shape, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sharded_grad, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_both_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_dim_neg, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_sparse_gather_ind_scalar, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_grad_warnings, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_multiple_outputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_thread_shutdown, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_too_many_grads, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unrelated_inputs, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_unused_output, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_var_mean_differentiable, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_version_counter, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_view_func_replay_with_modified_state, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_volatile_deprecated, test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_will_engine_execute_node, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_kwargs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_reentrant_backwards_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_same_graph_early_stop_True, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_False, test/inductor/test_compiled_autograd.py::TestNestedCheckpointWithCompiledAutograd::test_nested_checkpoint_two_children_early_stop_True, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_abstract_impl_on_existing_op_with_CompositeExplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_dict_grad_for_nontensor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_mutable, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_incorrect_schema_no_output, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_impl_on_existing_op_with_key_key_AutogradCUDA, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_output_differentiability_tensorlist, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_backward_tensorlist_input_requires_list_grads_with_same_numel, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_basic_make_fx, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_basic, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_data_dependent_nms_dynamic_compile, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_defined_in_python, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_duplicate_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_abstract_overload, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_device_cpu, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_invalid_devices, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_multiple, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CPU, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_on_existing_op_with_cpu_registration_key_CompositeImplicitAutograd, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_impl_separate, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_supported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_infer_schema_unsupported, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_qualname, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_invalid_schemas, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_functional_schema, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_is_tensorlist_like_type, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_define, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_legacy_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_meta_for_data_dependent_shape_operation, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_name_must_match, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_new_data_dependent_symint, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_impl, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_override_meta, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_private_ctor, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_supported_param_types, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_symints, test/inductor/test_compiled_autograd.py::TestCustomOpWithCompiledAutograd::test_unsupported_schemas, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_allow_python_side_effects_utility, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_constants, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_input_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_numpy_number, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_tracked, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_capture_untracked_global_nested, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_branches_no_arguments, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_free_variable_in_both_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_graph_break_in_one_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_pytree_operands, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_side_effect_in_one_branches, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_cond_with_constant_pred, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_fallback_on_graph_break_simple, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_freevars_as_inputs_to_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_grad_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hints_wrapper_no_hints, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_hopify_generic_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_internal_nonlocal, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_lift_tensors_with_compound_expressions, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_lowers_to_graph, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_multi_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_pytree_return, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_map_subgraph_name_is_valid, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_tuple_output, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_nested_wrap, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_no_freevars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_output_with_dict, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_register_subclass, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_var_used_multiple_times, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_return_captured_vars, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_del_existing_attr_nonlocal_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_local_list_append_no_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_list, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_global_tensor, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_num_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_mutate_nonlocal_tensor_builtin, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nested_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_nonlocal_list_append_graph_break, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_global_obj, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_existing_attr_nonlocal_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_side_effect_set_new_attr_global_module, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_symint_in_slice, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_unbacked_symbol_closure, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_multiply_scalar, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_vmap_source_fn_stack, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_allow_local_assign_in_body_fn, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_inductor_compiled_regions_option, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_default_else_branch, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_only, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_kwarg_recompile, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_pytree_kwargs, test/inductor/test_compiled_autograd.py::HigherOrderOpTestsWithCompiledAutograd::test_wrap_source_fn_stack, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_functional_call_sequential_params_and_buffers, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_fn_with_kwargs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_python_scalar, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_pytree, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_grad_with_side_effect, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_hessian_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacfwd_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jacrev_two_tensors_argnums, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_freevar_tensor, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_simple, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_jvp_two_tensors_has_aux, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_teardown_resets_nested_graph_breaks, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_call_compiled_backward_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vjp_multiple_outputs_python_struct, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_call_torch_compile_fn, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_free_const, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_invocation_out_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_multiple_outputs_diff_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_over_vmap_captured, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_pytree_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_different_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_recompile_same_config, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_side_effects_append_input, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_two_inputs_tuple_in_dims, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_conditional_graph_break, test/inductor/test_compiled_autograd.py::FuncTorchHigherOrderOpTestsWithCompiledAutograd::test_vmap_with_graph_break, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_cond_with_invalid_kwargs, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_dropout_inductor, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_flop_counter_for_cond_unbalanced_branches, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_function, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_module, test/inductor/test_compiled_autograd.py::ActivationCheckpointingTestsWithCompiledAutograd::test_non_aliasing_util, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_device_mesh_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_basic_export, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_dynamo_disable, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_constructor_w_graph_break, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_different_gradient_placement, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dont_recompile_on_same_placement_devicemesh, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_loss_parallel_log_softmax, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamic_slice, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_dynamo_device_mesh_attrs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_graph_output, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_partial_placement_redistribute_unbalanced_correct_strides, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dtensor_requires_grad_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_dynamic_shapes, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_from_local_redistribute_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_dtensor_recompile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_from_local_grad_placements_sequence_intermediate_as_args, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_grad_placements_sequence_intermediate, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_dynamo_to_local_kwargs_forward_hook, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_fakify_dtensor, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_graph_input_is_async, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_placement_compile, test/inductor/test_compiled_autograd.py::TestDTensorCompileWithCompiledAutograd::test_unwrap_async_collective_tensor_tangent, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_cond_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_quant_packed_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_invoke_subgraph_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_nested_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_map_simple_cuda_float32, test/inductor/test_compiled_autograd.py::TestCompiledAutogradOpInfoCUDA::test_hops_in_bwd_while_loop_simple_cuda_float32
2025-12-04T12:20:32.4118574Z 
2025-12-04T12:20:32.4118833Z Finished inductor/test_compiled_autograd 1/2 ... [2025-12-04 12:20:32.383014][12472.311307639], took 7.29min
2025-12-04T12:20:32.4119674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-d8fc516c8be54fc6.xml
2025-12-04T12:20:32.4905559Z Running inductor/test_layout_optim 1/1 ... [2025-12-04 12:20:32.490316][12472.418613128]
2025-12-04T12:20:32.4906150Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:20:32.4908647Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_layout_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:32.490611]
2025-12-04T12:20:37.8788968Z 
2025-12-04T12:20:37.8790500Z inductor/test_layout_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_layout_optim_1.1_85312b2aa31c9171_.log
2025-12-04T12:20:37.8791336Z Running 0 items in this shard:
2025-12-04T12:20:37.8791517Z 
2025-12-04T12:20:37.8791822Z Finished inductor/test_layout_optim 1/1 ... [2025-12-04 12:20:37.878647][12477.806944429], took 0.09min
2025-12-04T12:20:37.9013785Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_layout_optim/inductor.test_layout_optim-ff0e0fc528f4f3dd.xml
2025-12-04T12:20:37.9309908Z Running dynamo/test_exc 1/1 ... [2025-12-04 12:20:37.930781][12477.859080493]
2025-12-04T12:20:37.9310605Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:20:37.9313481Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_exc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:37.931075]
2025-12-04T12:20:43.7560415Z 
2025-12-04T12:20:43.7561625Z dynamo/test_exc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_exc_1.1_e6b03d521cd15643_.log
2025-12-04T12:20:43.7564414Z Running 10 items in this shard: test/dynamo/test_exc.py::ExcTests::test_backend_suppress_line, test/dynamo/test_exc.py::ExcTests::test_graph_break_log, test/dynamo/test_exc.py::ExcTests::test_graph_break_log_generic_jump, test/dynamo/test_exc.py::ExcTests::test_internal_error_no_suppress, test/dynamo/test_exc.py::ExcTests::test_internal_error_suppress_errors, test/dynamo/test_exc.py::ExcTests::test_not_implemented_error, test/dynamo/test_exc.py::ExcTests::test_trigger_bisect_on_error, test/dynamo/test_exc.py::ExcTests::test_trigger_on_error, test/dynamo/test_exc.py::ExcTests::test_unsupported_error, test/dynamo/test_exc.py::ExcTests::test_unsupported_real_stack
2025-12-04T12:20:43.7566657Z 
2025-12-04T12:20:43.7566850Z Finished dynamo/test_exc 1/1 ... [2025-12-04 12:20:43.755525][12483.683814946], took 0.10min
2025-12-04T12:20:43.7790662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_exc/dynamo.test_exc-59dcc92175511a1b.xml
2025-12-04T12:20:43.8117563Z Running inductor/test_aot_inductor_arrayref 1/2 ... [2025-12-04 12:20:43.811483][12483.739781926]
2025-12-04T12:20:43.8118091Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:20:43.8120869Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_arrayref.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:43.811808]
2025-12-04T12:26:56.1965797Z 
2025-12-04T12:26:56.1967388Z inductor/test_aot_inductor_arrayref 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_arrayref_1.2_f8b4577ce160ed2e_.log
2025-12-04T12:26:56.2049360Z Running 159 items in this shard: test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_amp_fallback_random_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_constant_tensor_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_constant_tensor_name_collision_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_fp8_dtype_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_sym_inputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_user_defined_triton_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printing_model_inputs_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_profiler_enable_kernel_profile_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_profiler_enable_kernel_profile_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_runtime_asserts_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_autotune_int64_user_defined_triton_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_backward_no_op_logging_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_bool_input_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_3_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_reuse_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_clamp_decomposition_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_codegen_int_array_var_fix_memory_leak_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_composed_dynamic_size_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_cpu_predicate_cuda_operands_max_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_mismatched_branch_output_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_nested_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_non_tensor_predicates_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_share_predicate_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_symint_input_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_unbacked_symint_closure_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_use_buffers_from_outer_scope_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_with_reinterpret_view_inputs_outputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_consecutive_compiles_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_constant_folding_with_update_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_conv3d_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_convolution_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_custom_op_in_subgraph_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_d2h_copy_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_deconv_freezing_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_device_moved_constant_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_duplicated_params_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_dynamic_scalar_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_dynamic_smem_above_default_limit_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_empty_cat_dtype_promotion_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_empty_constant_folding_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fake_tensor_device_validation_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fallback_mem_leak_fix_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fill__fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_foreach_multiple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fqn_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_free_inactive_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_index_put_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_dynamic_dim_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_grid_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_mmaped_weights_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_libtorch_free_so_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_linear_dynamic_maxautotune_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_missing_cubin_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_missing_output_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_mixed_device_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_model_modified_weights_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_multi_device_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_nan_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_nested_tensor_from_jagged_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_non_default_gpu_device_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_none_args_aot_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_normal_functional_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_on_gpu_device1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_misaligned_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_path_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_path_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_poi_multiple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_profile_benchmark_harness_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_proxy_executor_permute_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_quanatized_int8_linear_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_quantized_linear_bias_none_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_replace_unbacked_symbol_with_backed_expr_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_replicate_on_devices_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_return_view_constant_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_reuse_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_complex_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_dtype_failed_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_fp8_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_large_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_shape_failed_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_same_backing_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scaled_grouped_mm_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scatter_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scatter_reduce_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sdpa_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sdpa_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_seq_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_False_max_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_True_max_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_True_max_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_size_from_multi_output_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_size_with_unbacked_add_expr_transitive_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_stride_with_unbacked_expr_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_subclasses_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sym_i64_input_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_symint_item_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sympy_cpp_printer_min_max_minmax0_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_autotuning_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_bool_param_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_dynamic_grid_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_extern_kernel_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_reinterpret_view_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_sympy_expr_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_mutated_autotuning_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbounded_expr_substitutions_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_update_inactive_constant_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_using_model_name_for_files_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_weight_on_disk_legacy_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_conv_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_mixed_device_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_outer_buffers_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_pytree_inputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_sym_expr_cond_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_sym_expr_cond_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_grid_with_unbacked_symbols_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_size_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_size_weight_cpu_with_stack_allocation
2025-12-04T12:26:56.2126416Z 
2025-12-04T12:26:56.2126674Z Finished inductor/test_aot_inductor_arrayref 1/2 ... [2025-12-04 12:26:56.196856][12856.125150635], took 6.21min
2025-12-04T12:26:56.2217878Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_arrayref/inductor.test_aot_inductor_arrayref-c35059cecd7c3b99.xml
2025-12-04T12:26:56.7595323Z Uploading artifacts took 0.45 seconds
2025-12-04T12:26:56.7597438Z Running inductor/test_halide 1/1 ... [2025-12-04 12:26:56.759534][12856.687830874]
2025-12-04T12:26:56.7597853Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:26:56.7600930Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_halide.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:26:56.759853]
2025-12-04T12:27:02.5066785Z 
2025-12-04T12:27:02.5067632Z inductor/test_halide 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_halide_1.1_2220e392a986bf8c_.log
2025-12-04T12:27:02.5068239Z 
2025-12-04T12:27:02.5068520Z Finished inductor/test_halide 1/1 ... [2025-12-04 12:27:02.506415][12862.434712532], took 0.10min
2025-12-04T12:27:02.5301946Z Running inductor/test_deterministic 1/3 ... [2025-12-04 12:27:02.529936][12862.458235174]
2025-12-04T12:27:02.5302440Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:27:02.5304887Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:27:02.530231]
2025-12-04T12:30:09.8791322Z 
2025-12-04T12:30:09.8792247Z inductor/test_deterministic 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_deterministic_1.3_31baf2894918a3e4_.log
2025-12-04T12:30:09.8800484Z Running 15 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_max_autotune_deterministic_True, test/inductor/test_deterministic.py::DeterministicTest::test_mm_padding_deterministic_False, test/inductor/test_deterministic.py::DeterministicTest::test_mm_padding_deterministic_True, test/inductor/test_deterministic.py::DeterministicTest::test_reduction_coordesc_tuning_deterministic_False, test/inductor/test_deterministic.py::DeterministicTest::test_reduction_coordesc_tuning_deterministic_True, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_inference_precision_float16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_training_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_training_precision_bfloat16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_training_precision_float32, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_float16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_float32, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_inference_precision_bfloat16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_training_precision_float32
2025-12-04T12:30:09.8806831Z 
2025-12-04T12:30:09.8807086Z Finished inductor/test_deterministic 1/3 ... [2025-12-04 12:30:09.878780][13049.807070703], took 3.12min
2025-12-04T12:30:09.9030727Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccee7b90c33901e0.xml
2025-12-04T12:30:09.9869459Z Running dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 12:30:09.986683][13049.914981784]
2025-12-04T12:30:09.9869951Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:30:09.9873184Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_deque_reconstruct.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:09.987035]
2025-12-04T12:30:13.7582095Z 
2025-12-04T12:30:13.7582981Z dynamo/test_deque_reconstruct 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_deque_reconstruct_1.1_5be42757c28bbc33_.log
2025-12-04T12:30:13.7584858Z Running 3 items in this shard: test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_not_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_shallows_globals
2025-12-04T12:30:13.7586090Z 
2025-12-04T12:30:13.7586389Z Finished dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 12:30:13.757844][13053.686136317], took 0.06min
2025-12-04T12:30:13.7811259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-4527efee43b2418d.xml
2025-12-04T12:30:13.8147987Z Running inductor/test_inductor_annotations 1/1 ... [2025-12-04 12:30:13.814558][13053.742857429]
2025-12-04T12:30:13.8148506Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:30:13.8151574Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_annotations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:13.814899]
2025-12-04T12:30:22.8954768Z 
2025-12-04T12:30:22.8955772Z inductor/test_inductor_annotations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_annotations_1.1_acd21ad590bd7056_.log
2025-12-04T12:30:22.8957307Z Running 2 items in this shard: test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_no_annotations, test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_training_annotation
2025-12-04T12:30:22.8958143Z 
2025-12-04T12:30:22.8958467Z Finished inductor/test_inductor_annotations 1/1 ... [2025-12-04 12:30:22.895073][13062.823365282], took 0.15min
2025-12-04T12:30:22.9186611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-1bfa13dfa66ba37a.xml
2025-12-04T12:30:23.0183258Z Running inductor/test_compile_worker 1/1 ... [2025-12-04 12:30:23.018074][13062.946373264]
2025-12-04T12:30:23.0183732Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:30:23.0186832Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_worker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:30:23.018380]
2025-12-04T12:31:11.0663471Z 
2025-12-04T12:31:11.0665974Z inductor/test_compile_worker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_worker_1.1_22d83b26da38be9a_.log
2025-12-04T12:31:11.0673717Z Running 16 items in this shard: test/inductor/test_compile_worker.py::TestCompileWorker::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorker::test_crash, test/inductor/test_compile_worker.py::TestCompileWorker::test_exception, test/inductor/test_compile_worker.py::TestCompileWorker::test_logging, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_crash, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_exception, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_logging, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestTimer::test_basics, test/inductor/test_compile_worker.py::TestTimer::test_never_fires, test/inductor/test_compile_worker.py::TestTimer::test_repeated_calls, test/inductor/test_compile_worker.py::TestTimer::test_spammy_calls
2025-12-04T12:31:11.0680825Z 
2025-12-04T12:31:11.0681244Z Finished inductor/test_compile_worker 1/1 ... [2025-12-04 12:31:11.065936][13110.994229199], took 0.80min
2025-12-04T12:31:11.0914856Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-d291715b5fb08603.xml
2025-12-04T12:31:11.1802949Z Running dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 12:31:11.180032][13111.108330738]
2025-12-04T12:31:11.1803441Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:31:11.1806121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_passes_pre_grad.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:31:11.180340]
2025-12-04T12:31:15.5021022Z 
2025-12-04T12:31:15.5021983Z dynamo/test_fx_passes_pre_grad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_2cb42dcf29e7ede2_.log
2025-12-04T12:31:15.5023055Z Running 1 items in this shard: test/dynamo/test_fx_passes_pre_grad.py::FxPassesPreGradTests::test_pass_execution_and_save
2025-12-04T12:31:15.5023535Z 
2025-12-04T12:31:15.5023882Z Finished dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 12:31:15.501784][13115.430081399], took 0.07min
2025-12-04T12:31:15.5262011Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-8f5f76cf24e3a322.xml
2025-12-04T12:31:15.5590830Z Running inductor/test_fp8 1/1 ... [2025-12-04 12:31:15.558848][13115.487148224]
2025-12-04T12:31:15.5591260Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:31:15.5594074Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fp8.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:31:15.559149]
2025-12-04T12:33:36.6389122Z 
2025-12-04T12:33:36.6390486Z PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_041887d0b8d7fee8_.log)
2025-12-04T12:33:36.6392832Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml
2025-12-04T12:33:36.6394729Z ============================= test session starts ==============================
2025-12-04T12:33:36.6395852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:36.6396806Z cachedir: .pytest_cache
2025-12-04T12:33:36.6397690Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:36.6398477Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:33:36.6398874Z configfile: pytest.ini
2025-12-04T12:33:36.6399616Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:33:36.6400472Z collecting ... collected 188 items
2025-12-04T12:33:36.6400798Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:33:36.6469652Z Running 188 items in this shard: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda
2025-12-04T12:33:36.6535388Z 
2025-12-04T12:33:36.6535727Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda PASSED [1.6693s] [  0%]
2025-12-04T12:33:36.6536411Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda PASSED [0.2223s] [  1%]
2025-12-04T12:33:36.6537097Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda PASSED [0.6143s] [  1%]
2025-12-04T12:33:36.6537766Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda PASSED [0.2459s] [  2%]
2025-12-04T12:33:36.6538442Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.6223s] [  2%]
2025-12-04T12:33:36.6539111Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [0.4404s] [  3%]
2025-12-04T12:33:36.6539759Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.4560s] [  3%]
2025-12-04T12:33:36.6540406Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.4745s] [  4%]
2025-12-04T12:33:36.6541057Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.4719s] [  4%]
2025-12-04T12:33:36.6541732Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.6369s] [  5%]
2025-12-04T12:33:36.6542366Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda PASSED [0.4181s] [  5%]
2025-12-04T12:33:36.6542974Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda PASSED [0.4264s] [  6%]
2025-12-04T12:33:36.6543579Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda PASSED [0.5481s] [  6%]
2025-12-04T12:33:36.6544186Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda PASSED [0.5481s] [  7%]
2025-12-04T12:33:36.6544797Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.4587s] [  7%]
2025-12-04T12:33:36.6545393Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [0.1772s] [  8%]
2025-12-04T12:33:36.6545981Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.1826s] [  9%]
2025-12-04T12:33:36.6546585Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.2065s] [  9%]
2025-12-04T12:33:36.6547340Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2018s] [ 10%]
2025-12-04T12:33:36.6547937Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.2166s] [ 10%]
2025-12-04T12:33:36.6549127Z inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] Error in codegen for ComputedBuffer(name='buf0', layout=FixedLayout('cuda:0', torch.float8_e5m2, size=[s77, s77, s77], stride=[s77**2, s77, 1]), data=Pointwise(
2025-12-04T12:33:36.6550143Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   'cuda',
2025-12-04T12:33:36.6550641Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   torch.float8_e5m2,
2025-12-04T12:33:36.6551166Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   def inner_fn(index):
2025-12-04T12:33:36.6551713Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]       i0, i1, i2 = index
2025-12-04T12:33:36.6552325Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]       tmp0 = ops.load(arg1_1, i2 + i0 * s77**2 + i1 * s77)
2025-12-04T12:33:36.6553063Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]       tmp1 = ops.to_dtype(tmp0, torch.float8_e5m2, src_dtype=torch.float8_e4m3fn)
2025-12-04T12:33:36.6553722Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]       return tmp1
2025-12-04T12:33:36.6554199Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   ,
2025-12-04T12:33:36.6554700Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   ranges=[s77, s77, s77],
2025-12-04T12:33:36.6555283Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   origin_node=convert_element_type,
2025-12-04T12:33:36.6555932Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   origins=OrderedSet([convert_element_type]),
2025-12-04T12:33:36.6556518Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   stack_traces = {,
2025-12-04T12:33:36.6557233Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]     File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 164, in fp8_cast,
2025-12-04T12:33:36.6557984Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]       return x.to(dtype=dtype),
2025-12-04T12:33:36.6558492Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   ,
2025-12-04T12:33:36.6558919Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0]   }
2025-12-04T12:33:36.6559590Z C1204 12:31:29.903000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/0] ), _split_size=None, _original_inner_fn=None, _original_ranges=None, _original_reduction_ranges=None)
2025-12-04T12:33:36.6560732Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] Error in codegen for ComputedBuffer(name='buf0', layout=FixedLayout('cuda:0', torch.float8_e4m3fn, size=[s77, s77, s77], stride=[s77**2, s77, 1]), data=Pointwise(
2025-12-04T12:33:36.6561580Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   'cuda',
2025-12-04T12:33:36.6562083Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   torch.float8_e4m3fn,
2025-12-04T12:33:36.6562622Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   def inner_fn(index):
2025-12-04T12:33:36.6563158Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]       i0, i1, i2 = index
2025-12-04T12:33:36.6563803Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]       tmp0 = ops.load(arg1_1, i2 + i0 * s77**2 + i1 * s77)
2025-12-04T12:33:36.6564589Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]       tmp1 = ops.to_dtype(tmp0, torch.float8_e4m3fn, src_dtype=torch.float8_e5m2)
2025-12-04T12:33:36.6565235Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]       return tmp1
2025-12-04T12:33:36.6565733Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   ,
2025-12-04T12:33:36.6566262Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   ranges=[s77, s77, s77],
2025-12-04T12:33:36.6566838Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   origin_node=convert_element_type,
2025-12-04T12:33:36.6567470Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   origins=OrderedSet([convert_element_type]),
2025-12-04T12:33:36.6568048Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   stack_traces = {,
2025-12-04T12:33:36.6568755Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]     File "/var/lib/jenkins/workspace/test/inductor/test_fp8.py", line 164, in fp8_cast,
2025-12-04T12:33:36.6569477Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]       return x.to(dtype=dtype),
2025-12-04T12:33:36.6569987Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   ,
2025-12-04T12:33:36.6570420Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1]   }
2025-12-04T12:33:36.6571085Z C1204 12:31:30.158000 166614 site-packages/torch/_inductor/scheduler.py:1683] [0/1] ), _split_size=None, _original_inner_fn=None, _original_ranges=None, _original_reduction_ranges=None)
2025-12-04T12:33:36.6571642Z PASSED [0.4772s] [ 11%]
2025-12-04T12:33:36.6572287Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 W1204 12:31:30.420000 166614 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:33:36.6572936Z PASSED [1.0573s] [ 11%]
2025-12-04T12:33:36.6573340Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 ('RERUN', {'yellow': True}) [1.0094s] [ 12%]
2025-12-04T12:33:36.6574005Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 ('RERUN', {'yellow': True}) [0.9918s] [ 12%]
2025-12-04T12:33:36.6574611Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 FAILED [1.0006s] [ 12%]
2025-12-04T12:33:36.6574933Z 
2025-12-04T12:33:36.6575044Z ==================================== RERUNS ====================================
2025-12-04T12:33:36.6575380Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6575707Z Traceback (most recent call last):
2025-12-04T12:33:36.6576174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6576627Z     method(*args, **kwargs)
2025-12-04T12:33:36.6577044Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6577488Z     method(*args, **kwargs)
2025-12-04T12:33:36.6577899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6578331Z     with policy():
2025-12-04T12:33:36.6578736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6579179Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6579962Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6580770Z 
2025-12-04T12:33:36.6580910Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6581422Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6581858Z 
2025-12-04T12:33:36.6582023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6582441Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6582725Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6582970Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6583342Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6584164Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6584853Z graph_break []
2025-12-04T12:33:36.6585154Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6585596Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6585922Z Traceback (most recent call last):
2025-12-04T12:33:36.6586377Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6586822Z     method(*args, **kwargs)
2025-12-04T12:33:36.6587243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6587676Z     method(*args, **kwargs)
2025-12-04T12:33:36.6588089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6588541Z     with policy():
2025-12-04T12:33:36.6588951Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6589390Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6590152Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6590893Z 
2025-12-04T12:33:36.6591024Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6591532Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6591909Z 
2025-12-04T12:33:36.6592070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6592442Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6592729Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6592958Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6593313Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6594130Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6594838Z graph_break []
2025-12-04T12:33:36.6595128Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6595537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6595818Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6596039Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6596443Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6597320Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6598042Z graph_break []
2025-12-04T12:33:36.6598364Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6598741Z =================================== FAILURES ===================================
2025-12-04T12:33:36.6599098Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6599424Z Traceback (most recent call last):
2025-12-04T12:33:36.6599882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6600406Z     method(*args, **kwargs)
2025-12-04T12:33:36.6600845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6601294Z     method(*args, **kwargs)
2025-12-04T12:33:36.6601726Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6602166Z     with policy():
2025-12-04T12:33:36.6602572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6603016Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6603789Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6604530Z 
2025-12-04T12:33:36.6604664Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6605183Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6605566Z 
2025-12-04T12:33:36.6605729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6606104Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6606390Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6606616Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6606977Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6607798Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6608484Z graph_break []
2025-12-04T12:33:36.6608774Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6609185Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6609469Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6609698Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6610055Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6610873Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6611567Z graph_break []
2025-12-04T12:33:36.6611854Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6612354Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6612630Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6612854Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6613244Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6614102Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6614801Z graph_break []
2025-12-04T12:33:36.6615087Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6615743Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml -
2025-12-04T12:33:36.6616329Z =========================== short test summary info ============================
2025-12-04T12:33:36.6617656Z FAILED [1.0006s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6618660Z 
2025-12-04T12:33:36.6618795Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6619311Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6619696Z 
2025-12-04T12:33:36.6619857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6620220Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:33:36.6620531Z ==================== 1 failed, 22 passed, 2 rerun in 13.85s ====================
2025-12-04T12:33:36.6620789Z Got exit code 1
2025-12-04T12:33:36.6620953Z Retrying single test...
2025-12-04T12:33:36.6621343Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml
2025-12-04T12:33:36.6621801Z ============================= test session starts ==============================
2025-12-04T12:33:36.6622191Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:36.6622546Z cachedir: .pytest_cache
2025-12-04T12:33:36.6622967Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:36.6623423Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:33:36.6623631Z configfile: pytest.ini
2025-12-04T12:33:36.6624085Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:33:36.6624654Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:33:36.6625212Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6625709Z Running 1 items in this shard
2025-12-04T12:33:36.6625840Z 
2025-12-04T12:33:36.6626384Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:31:41.862994488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6626985Z 
2025-12-04T12:33:36.6627295Z [W1204 12:31:50.512756638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6627672Z 
2025-12-04T12:33:36.6628050Z [W1204 12:31:50.512960432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6628476Z 
2025-12-04T12:33:36.6628784Z [W1204 12:31:50.515354145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6629213Z 
2025-12-04T12:33:36.6629509Z [W1204 12:31:50.515501617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6629882Z 
2025-12-04T12:33:36.6630217Z [W1204 12:31:50.517340580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6630588Z 
2025-12-04T12:33:36.6630889Z [W1204 12:31:50.517599655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6631255Z 
2025-12-04T12:33:36.6631560Z [W1204 12:31:50.517727217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6631940Z 
2025-12-04T12:33:36.6632230Z [W1204 12:31:50.518410819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6632603Z 
2025-12-04T12:33:36.6632897Z [W1204 12:31:50.518539601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6633271Z 
2025-12-04T12:33:36.6633568Z [W1204 12:31:50.518977349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6633938Z 
2025-12-04T12:33:36.6634237Z [W1204 12:31:50.519106742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6634607Z 
2025-12-04T12:33:36.6634901Z [W1204 12:31:50.519414497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6635275Z 
2025-12-04T12:33:36.6635564Z [W1204 12:31:50.519541719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6635952Z 
2025-12-04T12:33:36.6636244Z [W1204 12:31:50.519822724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6636622Z 
2025-12-04T12:33:36.6636915Z [W1204 12:31:50.519947877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6637283Z 
2025-12-04T12:33:36.6637580Z [W1204 12:31:50.520273063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6637950Z 
2025-12-04T12:33:36.6638246Z [W1204 12:31:50.520423545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6638626Z 
2025-12-04T12:33:36.6638911Z W1204 12:31:50.848000 167730 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:33:36.6639558Z [W1204 12:31:51.579566518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6639945Z 
2025-12-04T12:33:36.6640331Z [W1204 12:31:51.579883053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6640702Z 
2025-12-04T12:33:36.6641000Z [W1204 12:31:51.580040686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6641367Z 
2025-12-04T12:33:36.6641663Z [W1204 12:31:51.580432262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6642030Z 
2025-12-04T12:33:36.6642378Z [W1204 12:31:51.580566685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6642803Z 
2025-12-04T12:33:36.6643096Z [W1204 12:31:51.580811919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6643509Z 
2025-12-04T12:33:36.6643803Z [W1204 12:31:51.581020863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6644206Z 
2025-12-04T12:33:36.6644507Z [W1204 12:31:51.581148295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6644876Z 
2025-12-04T12:33:36.6645174Z [W1204 12:31:51.581457830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6645544Z 
2025-12-04T12:33:36.6645836Z [W1204 12:31:51.581586652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6646225Z 
2025-12-04T12:33:36.6646518Z [W1204 12:31:51.581889158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6646899Z 
2025-12-04T12:33:36.6647191Z [W1204 12:31:51.582018500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6647567Z 
2025-12-04T12:33:36.6647863Z [W1204 12:31:51.582289445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6648230Z 
2025-12-04T12:33:36.6648533Z [W1204 12:31:51.582415297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6648901Z 
2025-12-04T12:33:36.6649213Z [W1204 12:31:51.582670111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6649587Z 
2025-12-04T12:33:36.6649883Z [W1204 12:31:51.582794043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6650260Z 
2025-12-04T12:33:36.6650556Z [W1204 12:31:51.583048118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6650935Z 
2025-12-04T12:33:36.6651230Z [W1204 12:31:51.583175210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6651600Z 
2025-12-04T12:33:36.6651691Z ('RERUN', {'yellow': True}) [11.2691s] [100%]
2025-12-04T12:33:36.6652374Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:31:52.229000514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6652982Z 
2025-12-04T12:33:36.6653279Z [W1204 12:31:52.229279409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6653652Z 
2025-12-04T12:33:36.6653942Z [W1204 12:31:52.229416141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6654318Z 
2025-12-04T12:33:36.6654612Z [W1204 12:31:52.229788888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6654981Z 
2025-12-04T12:33:36.6655279Z [W1204 12:31:52.229922650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6655648Z 
2025-12-04T12:33:36.6655943Z [W1204 12:31:52.230198395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6656313Z 
2025-12-04T12:33:36.6656655Z [W1204 12:31:52.230407758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6657068Z 
2025-12-04T12:33:36.6657357Z [W1204 12:31:52.230526210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6657771Z 
2025-12-04T12:33:36.6658115Z [W1204 12:31:52.230837466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6658556Z 
2025-12-04T12:33:36.6659039Z [W1204 12:31:52.230966008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6659441Z 
2025-12-04T12:33:36.6659746Z [W1204 12:31:52.231267723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6660123Z 
2025-12-04T12:33:36.6660425Z [W1204 12:31:52.231393465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6660811Z 
2025-12-04T12:33:36.6661101Z [W1204 12:31:52.231661020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6661478Z 
2025-12-04T12:33:36.6661770Z [W1204 12:31:52.231784352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6662138Z 
2025-12-04T12:33:36.6662437Z [W1204 12:31:52.232053847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6662804Z 
2025-12-04T12:33:36.6663101Z [W1204 12:31:52.232176649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6663471Z 
2025-12-04T12:33:36.6663764Z [W1204 12:31:52.232447653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6664154Z 
2025-12-04T12:33:36.6664450Z [W1204 12:31:52.232569575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6664825Z 
2025-12-04T12:33:36.6665113Z [W1204 12:31:52.873924561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6665481Z 
2025-12-04T12:33:36.6665777Z [W1204 12:31:52.874201396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6666145Z 
2025-12-04T12:33:36.6666445Z [W1204 12:31:52.874333238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6666815Z 
2025-12-04T12:33:36.6667106Z [W1204 12:31:52.874700785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6667483Z 
2025-12-04T12:33:36.6667775Z [W1204 12:31:52.874833467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6668153Z 
2025-12-04T12:33:36.6668448Z [W1204 12:31:52.875075051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6668837Z 
2025-12-04T12:33:36.6669140Z [W1204 12:31:52.875281324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6669507Z 
2025-12-04T12:33:36.6669804Z [W1204 12:31:52.875408187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6670174Z 
2025-12-04T12:33:36.6670464Z [W1204 12:31:52.875719212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6670932Z 
2025-12-04T12:33:36.6671239Z [W1204 12:31:52.875847714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6671616Z 
2025-12-04T12:33:36.6671908Z [W1204 12:31:52.876154200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6672321Z 
2025-12-04T12:33:36.6672672Z [W1204 12:31:52.876283882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6673042Z 
2025-12-04T12:33:36.6673341Z [W1204 12:31:52.876561817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6673708Z 
2025-12-04T12:33:36.6674006Z [W1204 12:31:52.876690239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6674378Z 
2025-12-04T12:33:36.6674671Z [W1204 12:31:52.876946863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6675046Z 
2025-12-04T12:33:36.6675337Z [W1204 12:31:52.877072726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6675715Z 
2025-12-04T12:33:36.6676009Z [W1204 12:31:52.877332280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6676377Z 
2025-12-04T12:33:36.6676674Z [W1204 12:31:52.877455782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6677055Z 
2025-12-04T12:33:36.6677151Z ('RERUN', {'yellow': True}) [1.3607s] [100%]
2025-12-04T12:33:36.6677842Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:31:53.591663956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6678449Z 
2025-12-04T12:33:36.6678743Z [W1204 12:31:53.591935161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6679125Z 
2025-12-04T12:33:36.6679418Z [W1204 12:31:53.592066183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6679790Z 
2025-12-04T12:33:36.6680208Z [W1204 12:31:53.592443970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6680579Z 
2025-12-04T12:33:36.6680880Z [W1204 12:31:53.592574422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6681249Z 
2025-12-04T12:33:36.6681543Z [W1204 12:31:53.592822476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6681931Z 
2025-12-04T12:33:36.6682226Z [W1204 12:31:53.593022900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6682603Z 
2025-12-04T12:33:36.6682896Z [W1204 12:31:53.593144812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6683267Z 
2025-12-04T12:33:36.6683566Z [W1204 12:31:53.593450937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6683936Z 
2025-12-04T12:33:36.6684233Z [W1204 12:31:53.593579739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6684605Z 
2025-12-04T12:33:36.6684955Z [W1204 12:31:53.593888295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6685377Z 
2025-12-04T12:33:36.6685669Z [W1204 12:31:53.594013927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6686080Z 
2025-12-04T12:33:36.6686375Z [W1204 12:31:53.594281641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6686742Z 
2025-12-04T12:33:36.6687071Z [W1204 12:31:53.594407573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6687442Z 
2025-12-04T12:33:36.6687737Z [W1204 12:31:53.594660678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6688107Z 
2025-12-04T12:33:36.6688400Z [W1204 12:31:53.594783290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6688778Z 
2025-12-04T12:33:36.6689077Z [W1204 12:31:53.595041815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6689453Z 
2025-12-04T12:33:36.6689747Z [W1204 12:31:53.595166257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6690116Z 
2025-12-04T12:33:36.6690415Z [W1204 12:31:54.391603648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6690786Z 
2025-12-04T12:33:36.6691083Z [W1204 12:31:54.391898333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6691450Z 
2025-12-04T12:33:36.6691742Z [W1204 12:31:54.392036466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6692119Z 
2025-12-04T12:33:36.6692411Z [W1204 12:31:54.392408172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6692783Z 
2025-12-04T12:33:36.6693073Z [W1204 12:31:54.392540764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6693443Z 
2025-12-04T12:33:36.6693743Z [W1204 12:31:54.392782868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6694203Z 
2025-12-04T12:33:36.6694662Z [W1204 12:31:54.392986432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6695039Z 
2025-12-04T12:33:36.6695329Z [W1204 12:31:54.393111974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6695702Z 
2025-12-04T12:33:36.6696015Z [W1204 12:31:54.393427140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6696388Z 
2025-12-04T12:33:36.6696678Z [W1204 12:31:54.393556732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6697063Z 
2025-12-04T12:33:36.6697359Z [W1204 12:31:54.393856737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6697729Z 
2025-12-04T12:33:36.6698027Z [W1204 12:31:54.393984509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6698397Z 
2025-12-04T12:33:36.6698695Z [W1204 12:31:54.394246924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6699063Z 
2025-12-04T12:33:36.6699410Z [W1204 12:31:54.394373356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6699823Z 
2025-12-04T12:33:36.6700114Z [W1204 12:31:54.394625060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6700521Z 
2025-12-04T12:33:36.6700845Z [W1204 12:31:54.394746863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6701217Z 
2025-12-04T12:33:36.6701515Z [W1204 12:31:54.394999607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6701887Z 
2025-12-04T12:33:36.6702187Z [W1204 12:31:54.395125199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6702565Z 
2025-12-04T12:33:36.6702631Z FAILED [1.4154s] [100%]
2025-12-04T12:33:36.6702749Z 
2025-12-04T12:33:36.6702842Z ==================================== RERUNS ====================================
2025-12-04T12:33:36.6703187Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6703504Z Traceback (most recent call last):
2025-12-04T12:33:36.6703972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6704425Z     method(*args, **kwargs)
2025-12-04T12:33:36.6704860Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6705313Z     method(*args, **kwargs)
2025-12-04T12:33:36.6705743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6706188Z     with policy():
2025-12-04T12:33:36.6706609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6707056Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6707832Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 230686720 and is now 268435456.
2025-12-04T12:33:36.6708566Z 
2025-12-04T12:33:36.6708709Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6709231Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6709609Z 
2025-12-04T12:33:36.6709773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6710155Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6710447Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6710680Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6711427Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6712252Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6712579Z graph_break []
2025-12-04T12:33:36.6712869Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6713274Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:33:36.6714229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:33:36.6715105Z   if out == self.unknown_value:
2025-12-04T12:33:36.6715403Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6715719Z Traceback (most recent call last):
2025-12-04T12:33:36.6716226Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6716682Z     method(*args, **kwargs)
2025-12-04T12:33:36.6717493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6717966Z     method(*args, **kwargs)
2025-12-04T12:33:36.6718411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6718853Z     with policy():
2025-12-04T12:33:36.6719266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6719725Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6720564Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6721291Z 
2025-12-04T12:33:36.6721425Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6721965Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6722350Z 
2025-12-04T12:33:36.6722513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6722893Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6723172Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6723404Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6724152Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6724957Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6725281Z graph_break []
2025-12-04T12:33:36.6725578Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6725987Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:33:36.6726919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:33:36.6727765Z   if out == self.unknown_value:
2025-12-04T12:33:36.6728042Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6728326Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6728554Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6728927Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6729743Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6730434Z graph_break []
2025-12-04T12:33:36.6730722Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6731085Z =================================== FAILURES ===================================
2025-12-04T12:33:36.6731500Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6731885Z Traceback (most recent call last):
2025-12-04T12:33:36.6732347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6732856Z     method(*args, **kwargs)
2025-12-04T12:33:36.6733287Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6733772Z     method(*args, **kwargs)
2025-12-04T12:33:36.6734195Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6734645Z     with policy():
2025-12-04T12:33:36.6735045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6735489Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6736257Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6736979Z 
2025-12-04T12:33:36.6737110Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6737638Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6738015Z 
2025-12-04T12:33:36.6738176Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6738549Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6738829Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6739057Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6739780Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6740595Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6740919Z graph_break []
2025-12-04T12:33:36.6741207Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6741601Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:33:36.6742507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:33:36.6743344Z   if out == self.unknown_value:
2025-12-04T12:33:36.6743599Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6743884Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6744111Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6744464Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6745279Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6745968Z graph_break []
2025-12-04T12:33:36.6746261Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6746665Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6746935Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6747162Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6747607Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6748419Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6749132Z graph_break []
2025-12-04T12:33:36.6749461Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6750120Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml -
2025-12-04T12:33:36.6750674Z =========================== short test summary info ============================
2025-12-04T12:33:36.6751793Z FAILED [1.4154s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6752791Z 
2025-12-04T12:33:36.6752923Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6753439Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6753819Z 
2025-12-04T12:33:36.6753985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6754324Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:33:36.6754637Z ================= 1 failed, 187 deselected, 2 rerun in 14.09s ==================
2025-12-04T12:33:36.6754906Z Got exit code 1
2025-12-04T12:33:36.6755072Z Retrying single test...
2025-12-04T12:33:36.6755475Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml
2025-12-04T12:33:36.6755937Z ============================= test session starts ==============================
2025-12-04T12:33:36.6756329Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:36.6756678Z cachedir: .pytest_cache
2025-12-04T12:33:36.6757095Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:36.6757561Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:33:36.6757774Z configfile: pytest.ini
2025-12-04T12:33:36.6758230Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:33:36.6758796Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:33:36.6759364Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6759854Z Running 1 items in this shard
2025-12-04T12:33:36.6760071Z 
2025-12-04T12:33:36.6760609Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:32:02.355911129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6761218Z 
2025-12-04T12:33:36.6761522Z [W1204 12:32:11.539936289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6761907Z 
2025-12-04T12:33:36.6762201Z [W1204 12:32:11.540171263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6762580Z 
2025-12-04T12:33:36.6762881Z [W1204 12:32:11.542493394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6763313Z 
2025-12-04T12:33:36.6763648Z [W1204 12:32:11.542636526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6764017Z 
2025-12-04T12:33:36.6764308Z [W1204 12:32:11.544468648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6764719Z 
2025-12-04T12:33:36.6765053Z [W1204 12:32:11.544727433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6765430Z 
2025-12-04T12:33:36.6765718Z [W1204 12:32:11.544858445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6766087Z 
2025-12-04T12:33:36.6766384Z [W1204 12:32:11.545241531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6766754Z 
2025-12-04T12:33:36.6767056Z [W1204 12:32:11.545371474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6767422Z 
2025-12-04T12:33:36.6767725Z [W1204 12:32:11.545802361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6768103Z 
2025-12-04T12:33:36.6768394Z [W1204 12:32:11.545930323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6768769Z 
2025-12-04T12:33:36.6769059Z [W1204 12:32:11.546239809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6769429Z 
2025-12-04T12:33:36.6769724Z [W1204 12:32:11.546368511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6770091Z 
2025-12-04T12:33:36.6770398Z [W1204 12:32:11.546639236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6770771Z 
2025-12-04T12:33:36.6771063Z [W1204 12:32:11.546763178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6771439Z 
2025-12-04T12:33:36.6771731Z [W1204 12:32:11.547039163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6772104Z 
2025-12-04T12:33:36.6772396Z [W1204 12:32:11.547161705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6772766Z 
2025-12-04T12:33:36.6773052Z W1204 12:32:11.875000 168209 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:33:36.6773712Z [W1204 12:32:12.641261482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6774088Z 
2025-12-04T12:33:36.6774381Z [W1204 12:32:12.641588608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6774764Z 
2025-12-04T12:33:36.6775063Z [W1204 12:32:12.641724280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6775447Z 
2025-12-04T12:33:36.6775742Z [W1204 12:32:12.642103397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6776115Z 
2025-12-04T12:33:36.6776413Z [W1204 12:32:12.642234109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6776783Z 
2025-12-04T12:33:36.6777124Z [W1204 12:32:12.642480253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6777554Z 
2025-12-04T12:33:36.6777856Z [W1204 12:32:12.642715317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6778235Z 
2025-12-04T12:33:36.6778567Z [W1204 12:32:12.642841750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6778942Z 
2025-12-04T12:33:36.6779271Z [W1204 12:32:12.643157025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6779641Z 
2025-12-04T12:33:36.6779939Z [W1204 12:32:12.643285367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6780305Z 
2025-12-04T12:33:36.6780605Z [W1204 12:32:12.643600583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6780981Z 
2025-12-04T12:33:36.6781275Z [W1204 12:32:12.643725795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6781654Z 
2025-12-04T12:33:36.6781947Z [W1204 12:32:12.643995310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6782327Z 
2025-12-04T12:33:36.6782621Z [W1204 12:32:12.644118202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6782990Z 
2025-12-04T12:33:36.6783290Z [W1204 12:32:12.644386106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6783663Z 
2025-12-04T12:33:36.6783959Z [W1204 12:32:12.644507579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6784334Z 
2025-12-04T12:33:36.6784631Z [W1204 12:32:12.644764583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6785009Z 
2025-12-04T12:33:36.6785308Z [W1204 12:32:12.644884705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6785698Z 
2025-12-04T12:33:36.6785784Z ('RERUN', {'yellow': True}) [11.8591s] [100%]
2025-12-04T12:33:36.6786494Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:32:13.291398934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6787094Z 
2025-12-04T12:33:36.6787395Z [W1204 12:32:13.291673739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6787770Z 
2025-12-04T12:33:36.6788066Z [W1204 12:32:13.291802321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6788441Z 
2025-12-04T12:33:36.6788732Z [W1204 12:32:13.292171368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6789107Z 
2025-12-04T12:33:36.6789402Z [W1204 12:32:13.292308270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6789771Z 
2025-12-04T12:33:36.6790065Z [W1204 12:32:13.292566764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6790430Z 
2025-12-04T12:33:36.6790725Z [W1204 12:32:13.292769018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6791094Z 
2025-12-04T12:33:36.6791431Z [W1204 12:32:13.292887450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6791852Z 
2025-12-04T12:33:36.6792146Z [W1204 12:32:13.293206535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6792558Z 
2025-12-04T12:33:36.6792858Z [W1204 12:32:13.293331308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6793237Z 
2025-12-04T12:33:36.6793563Z [W1204 12:32:13.293640673 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6793931Z 
2025-12-04T12:33:36.6794228Z [W1204 12:32:13.293765235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6794595Z 
2025-12-04T12:33:36.6794888Z [W1204 12:32:13.294034340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6795265Z 
2025-12-04T12:33:36.6795554Z [W1204 12:32:13.294156912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6795932Z 
2025-12-04T12:33:36.6796222Z [W1204 12:32:13.294416456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6796597Z 
2025-12-04T12:33:36.6796897Z [W1204 12:32:13.294535638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6797270Z 
2025-12-04T12:33:36.6797569Z [W1204 12:32:13.294801713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6797939Z 
2025-12-04T12:33:36.6798233Z [W1204 12:32:13.294921935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6798604Z 
2025-12-04T12:33:36.6798894Z [W1204 12:32:14.953990260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6799267Z 
2025-12-04T12:33:36.6799558Z [W1204 12:32:14.954268835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6799936Z 
2025-12-04T12:33:36.6800297Z [W1204 12:32:14.954412437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6800666Z 
2025-12-04T12:33:36.6800966Z [W1204 12:32:14.954782904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6801347Z 
2025-12-04T12:33:36.6801646Z [W1204 12:32:14.954916226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6802014Z 
2025-12-04T12:33:36.6802320Z [W1204 12:32:14.955159930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6802695Z 
2025-12-04T12:33:36.6802986Z [W1204 12:32:14.955361604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6803361Z 
2025-12-04T12:33:36.6803656Z [W1204 12:32:14.955485696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6804030Z 
2025-12-04T12:33:36.6804326Z [W1204 12:32:14.955799142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6804693Z 
2025-12-04T12:33:36.6804997Z [W1204 12:32:14.955931144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6805364Z 
2025-12-04T12:33:36.6805707Z [W1204 12:32:14.956231759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6806115Z 
2025-12-04T12:33:36.6806411Z [W1204 12:32:14.956368431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6806821Z 
2025-12-04T12:33:36.6807144Z [W1204 12:32:14.956639466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6807512Z 
2025-12-04T12:33:36.6807811Z [W1204 12:32:14.956764518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6808181Z 
2025-12-04T12:33:36.6808481Z [W1204 12:32:14.957017083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6808853Z 
2025-12-04T12:33:36.6809145Z [W1204 12:32:14.957139385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6809520Z 
2025-12-04T12:33:36.6809813Z [W1204 12:32:14.957393479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6810184Z 
2025-12-04T12:33:36.6810475Z [W1204 12:32:14.957513762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6810850Z 
2025-12-04T12:33:36.6810938Z ('RERUN', {'yellow': True}) [1.3700s] [100%]
2025-12-04T12:33:36.6811626Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 12:32:14.663759956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6812223Z 
2025-12-04T12:33:36.6812523Z [W1204 12:32:14.664035000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6812907Z 
2025-12-04T12:33:36.6813200Z [W1204 12:32:14.664165683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6813575Z 
2025-12-04T12:33:36.6813871Z [W1204 12:32:14.664548099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6814243Z 
2025-12-04T12:33:36.6814542Z [W1204 12:32:14.664680671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6814913Z 
2025-12-04T12:33:36.6815218Z [W1204 12:32:14.664930116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6815603Z 
2025-12-04T12:33:36.6815894Z [W1204 12:32:14.665139090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6816274Z 
2025-12-04T12:33:36.6816569Z [W1204 12:32:14.665261612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6816947Z 
2025-12-04T12:33:36.6817412Z [W1204 12:32:14.665577057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6817791Z 
2025-12-04T12:33:36.6818087Z [W1204 12:32:14.665706949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6818458Z 
2025-12-04T12:33:36.6818758Z [W1204 12:32:14.666013265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6819128Z 
2025-12-04T12:33:36.6819422Z [W1204 12:32:14.666139997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6819933Z 
2025-12-04T12:33:36.6820229Z [W1204 12:32:14.666406761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6820604Z 
2025-12-04T12:33:36.6820898Z [W1204 12:32:14.666530523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6821331Z 
2025-12-04T12:33:36.6821674Z [W1204 12:32:14.666789618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6822048Z 
2025-12-04T12:33:36.6822349Z [W1204 12:32:14.666911360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6822719Z 
2025-12-04T12:33:36.6823026Z [W1204 12:32:14.667170594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6823391Z 
2025-12-04T12:33:36.6823686Z [W1204 12:32:14.667293976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6824058Z 
2025-12-04T12:33:36.6824349Z [W1204 12:32:15.388530850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6824721Z 
2025-12-04T12:33:36.6825012Z [W1204 12:32:15.388827585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6825388Z 
2025-12-04T12:33:36.6825683Z [W1204 12:32:15.388959288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6826050Z 
2025-12-04T12:33:36.6826347Z [W1204 12:32:15.389329914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6826717Z 
2025-12-04T12:33:36.6827010Z [W1204 12:32:15.389459516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6827383Z 
2025-12-04T12:33:36.6827677Z [W1204 12:32:15.389698960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6828054Z 
2025-12-04T12:33:36.6828347Z [W1204 12:32:15.389904694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6828713Z 
2025-12-04T12:33:36.6829012Z [W1204 12:32:15.390056827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6829379Z 
2025-12-04T12:33:36.6829677Z [W1204 12:32:15.390382852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6830044Z 
2025-12-04T12:33:36.6830334Z [W1204 12:32:15.390509814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6830708Z 
2025-12-04T12:33:36.6831001Z [W1204 12:32:15.390815710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6831387Z 
2025-12-04T12:33:36.6831682Z [W1204 12:32:15.390942132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6832052Z 
2025-12-04T12:33:36.6832349Z [W1204 12:32:15.391203537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6832720Z 
2025-12-04T12:33:36.6833016Z [W1204 12:32:15.391327149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6833384Z 
2025-12-04T12:33:36.6833737Z [W1204 12:32:15.391574323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6834151Z 
2025-12-04T12:33:36.6834441Z [W1204 12:32:15.391701245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6834859Z 
2025-12-04T12:33:36.6835171Z [W1204 12:32:15.391958200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6835544Z 
2025-12-04T12:33:36.6835879Z [W1204 12:32:15.392081102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T12:33:36.6836249Z 
2025-12-04T12:33:36.6836321Z FAILED [1.3491s] [100%]
2025-12-04T12:33:36.6836430Z 
2025-12-04T12:33:36.6836521Z ==================================== RERUNS ====================================
2025-12-04T12:33:36.6836864Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6837186Z Traceback (most recent call last):
2025-12-04T12:33:36.6837659Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6838109Z     method(*args, **kwargs)
2025-12-04T12:33:36.6838542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6838990Z     method(*args, **kwargs)
2025-12-04T12:33:36.6839416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6839878Z     with policy():
2025-12-04T12:33:36.6840338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6840792Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6841564Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 230686720 and is now 268435456.
2025-12-04T12:33:36.6842311Z 
2025-12-04T12:33:36.6842445Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6842972Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6843364Z 
2025-12-04T12:33:36.6843536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6843912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6844206Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6844440Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6845181Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6845991Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6846318Z graph_break []
2025-12-04T12:33:36.6846614Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6847011Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:33:36.6847921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:33:36.6848778Z   if out == self.unknown_value:
2025-12-04T12:33:36.6849076Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6849392Z Traceback (most recent call last):
2025-12-04T12:33:36.6849933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6850383Z     method(*args, **kwargs)
2025-12-04T12:33:36.6850807Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6851279Z     method(*args, **kwargs)
2025-12-04T12:33:36.6851725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6852182Z     with policy():
2025-12-04T12:33:36.6852581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6853044Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6853813Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6854532Z 
2025-12-04T12:33:36.6854676Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6855198Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6855591Z 
2025-12-04T12:33:36.6855756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6856126Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6856405Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6856630Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6857359Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6858185Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6858519Z graph_break []
2025-12-04T12:33:36.6858809Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6859216Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:33:36.6860127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:33:36.6860963Z   if out == self.unknown_value:
2025-12-04T12:33:36.6861222Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6861503Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6861737Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6862096Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6862910Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6863610Z graph_break []
2025-12-04T12:33:36.6863898Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6864251Z =================================== FAILURES ===================================
2025-12-04T12:33:36.6864592Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________
2025-12-04T12:33:36.6864909Z Traceback (most recent call last):
2025-12-04T12:33:36.6865421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6865914Z     method(*args, **kwargs)
2025-12-04T12:33:36.6866345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6866790Z     method(*args, **kwargs)
2025-12-04T12:33:36.6867240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6867692Z     with policy():
2025-12-04T12:33:36.6868138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6868591Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6869351Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6870084Z 
2025-12-04T12:33:36.6870216Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6870733Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6871115Z 
2025-12-04T12:33:36.6871283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6871643Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6871927Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6872158Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6872886Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6873685Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6874009Z graph_break []
2025-12-04T12:33:36.6874307Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6874709Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:33:36.6875620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T12:33:36.6876457Z   if out == self.unknown_value:
2025-12-04T12:33:36.6876712Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6876989Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6877211Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6877567Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6878383Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6879067Z graph_break []
2025-12-04T12:33:36.6879347Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6879747Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6880073Z frames [('total', 2), ('ok', 2)]
2025-12-04T12:33:36.6880305Z stats [('calls_captured', 22), ('unique_graphs', 2)]
2025-12-04T12:33:36.6880671Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
2025-12-04T12:33:36.6881531Z inductor [('triton_bundler_save_kernel', 48), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)]
2025-12-04T12:33:36.6882264Z graph_break []
2025-12-04T12:33:36.6882554Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)]
2025-12-04T12:33:36.6883255Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml -
2025-12-04T12:33:36.6883874Z =========================== short test summary info ============================
2025-12-04T12:33:36.6884995Z FAILED [1.3491s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 266338304 and is now 268435456.
2025-12-04T12:33:36.6885989Z 
2025-12-04T12:33:36.6886134Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6886644Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6887034Z 
2025-12-04T12:33:36.6887196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6887544Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:33:36.6887856Z ================= 1 failed, 187 deselected, 2 rerun in 14.62s ==================
2025-12-04T12:33:36.6888125Z Got exit code 1
2025-12-04T12:33:36.6888484Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16
2025-12-04T12:33:36.6889059Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:33:36.6889661Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml
2025-12-04T12:33:36.6890123Z ============================= test session starts ==============================
2025-12-04T12:33:36.6890517Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:36.6890874Z cachedir: .pytest_cache
2025-12-04T12:33:36.6891290Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:36.6891750Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:33:36.6891980Z configfile: pytest.ini
2025-12-04T12:33:36.6892440Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:33:36.6893006Z collecting ... collected 188 items / 23 deselected / 165 selected
2025-12-04T12:33:36.6893304Z stepcurrent: skipping 23 already run items.
2025-12-04T12:33:36.6893532Z Running 165 items in this shard
2025-12-04T12:33:36.6893661Z 
2025-12-04T12:33:36.6894027Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda PASSED [3.1339s] [  0%]
2025-12-04T12:33:36.6894808Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda PASSED [1.8887s] [  1%]
2025-12-04T12:33:36.6895585Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda PASSED [1.8864s] [  1%]
2025-12-04T12:33:36.6896353Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda PASSED [1.8860s] [  2%]
2025-12-04T12:33:36.6897095Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda PASSED [0.5796s] [  3%]
2025-12-04T12:33:36.6897896Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.7819s] [  3%]
2025-12-04T12:33:36.6898665Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.9509s] [  4%]
2025-12-04T12:33:36.6899406Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.6066s] [  4%]
2025-12-04T12:33:36.6900212Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.6533s] [  5%]
2025-12-04T12:33:36.6900966Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.2766s] [  6%]
2025-12-04T12:33:36.6901696Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.3926s] [  6%]
2025-12-04T12:33:36.6902426Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.6147s] [  7%]
2025-12-04T12:33:36.6903161Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.3301s] [  7%]
2025-12-04T12:33:36.6903904Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [0.6516s] [  8%]
2025-12-04T12:33:36.6904640Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda PASSED [0.4575s] [  9%]
2025-12-04T12:33:36.6905361Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.3974s] [  9%]
2025-12-04T12:33:36.6906087Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.6255s] [ 10%]
2025-12-04T12:33:36.6906808Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.3353s] [ 10%]
2025-12-04T12:33:36.6907538Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.6509s] [ 11%]
2025-12-04T12:33:36.6908267Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.2751s] [ 12%]
2025-12-04T12:33:36.6908980Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.3896s] [ 12%]
2025-12-04T12:33:36.6909695Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.6187s] [ 13%]
2025-12-04T12:33:36.6910416Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.3295s] [ 13%]
2025-12-04T12:33:36.6911140Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [0.6494s] [ 14%]
2025-12-04T12:33:36.6911844Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.6538s] [ 15%]
2025-12-04T12:33:36.6912536Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.6588s] [ 15%]
2025-12-04T12:33:36.6913222Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda PASSED [0.4600s] [ 16%]
2025-12-04T12:33:36.6913892Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.5813s] [ 16%]
2025-12-04T12:33:36.6914560Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.4348s] [ 17%]
2025-12-04T12:33:36.6915226Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.5822s] [ 18%]
2025-12-04T12:33:36.6915945Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda PASSED [0.4369s] [ 18%]
2025-12-04T12:33:36.6926587Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.8019s] [ 19%]
2025-12-04T12:33:36.6927516Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.4047s] [ 20%]
2025-12-04T12:33:36.6928288Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.5407s] [ 20%]
2025-12-04T12:33:36.6928970Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda PASSED [0.4070s] [ 21%]
2025-12-04T12:33:36.6929625Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.5416s] [ 21%]
2025-12-04T12:33:36.6930316Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 PASSED [0.2214s] [ 22%]
2025-12-04T12:33:36.6930992Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 PASSED [0.3850s] [ 23%]
2025-12-04T12:33:36.6931670Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 PASSED [0.2096s] [ 23%]
2025-12-04T12:33:36.6932331Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 PASSED [0.3803s] [ 24%]
2025-12-04T12:33:36.6932986Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 PASSED [0.2056s] [ 24%]
2025-12-04T12:33:36.6933634Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 PASSED [0.3968s] [ 25%]
2025-12-04T12:33:36.6934261Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda PASSED [0.1149s] [ 26%]
2025-12-04T12:33:36.6934840Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda PASSED [0.1153s] [ 26%]
2025-12-04T12:33:36.6935617Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0003s] (Need device-side TMA support in Triton) [ 27%]
2025-12-04T12:33:36.6936596Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 27%]
2025-12-04T12:33:36.6937555Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 28%]
2025-12-04T12:33:36.6938536Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 29%]
2025-12-04T12:33:36.6939496Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 29%]
2025-12-04T12:33:36.6940475Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 30%]
2025-12-04T12:33:36.6941441Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 30%]
2025-12-04T12:33:36.6942391Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 31%]
2025-12-04T12:33:36.6943197Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda SKIPPED [0.0002s] (Not supported on non B200) [ 32%]
2025-12-04T12:33:36.6943824Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda PASSED [5.0432s] [ 32%]
2025-12-04T12:33:36.6944803Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda W1204 12:32:55.525000 168690 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:33:36.6945658Z ('RERUN', {'yellow': True}) [0.5648s] [ 33%]
2025-12-04T12:33:36.6946300Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5077s] [ 33%]
2025-12-04T12:33:36.6947193Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.5015s] [ 33%]
2025-12-04T12:33:36.6947669Z 
2025-12-04T12:33:36.6947771Z ==================================== RERUNS ====================================
2025-12-04T12:33:36.6948236Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _
2025-12-04T12:33:36.6948695Z Traceback (most recent call last):
2025-12-04T12:33:36.6949223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6949683Z     method(*args, **kwargs)
2025-12-04T12:33:36.6950134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6950590Z     method(*args, **kwargs)
2025-12-04T12:33:36.6950998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6951431Z     with policy():
2025-12-04T12:33:36.6951825Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6952265Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6953209Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632.
2025-12-04T12:33:36.6954116Z 
2025-12-04T12:33:36.6954251Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6954913Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda
2025-12-04T12:33:36.6955440Z 
2025-12-04T12:33:36.6955608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6955979Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6956262Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.6956482Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.6956837Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.6957223Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.6957474Z graph_break []
2025-12-04T12:33:36.6957678Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)]
2025-12-04T12:33:36.6958139Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _
2025-12-04T12:33:36.6958564Z Traceback (most recent call last):
2025-12-04T12:33:36.6959012Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6959463Z     method(*args, **kwargs)
2025-12-04T12:33:36.6959873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6960372Z     method(*args, **kwargs)
2025-12-04T12:33:36.6960828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6961308Z     with policy():
2025-12-04T12:33:36.6961723Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6962199Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6963166Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632.
2025-12-04T12:33:36.6964046Z 
2025-12-04T12:33:36.6964184Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6964854Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda
2025-12-04T12:33:36.6965388Z 
2025-12-04T12:33:36.6965559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6965945Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6966227Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.6966451Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.6966805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.6967179Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.6967419Z graph_break []
2025-12-04T12:33:36.6967609Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)]
2025-12-04T12:33:36.6967912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6968199Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.6968414Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.6968760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.6969127Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.6969363Z graph_break []
2025-12-04T12:33:36.6969557Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)]
2025-12-04T12:33:36.6969834Z =================================== FAILURES ===================================
2025-12-04T12:33:36.6970285Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _
2025-12-04T12:33:36.6970717Z Traceback (most recent call last):
2025-12-04T12:33:36.6971178Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6971635Z     method(*args, **kwargs)
2025-12-04T12:33:36.6972041Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6972499Z     method(*args, **kwargs)
2025-12-04T12:33:36.6972946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.6973391Z     with policy():
2025-12-04T12:33:36.6973787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.6974246Z     raise RuntimeError(msg)
2025-12-04T12:33:36.6975196Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632.
2025-12-04T12:33:36.6976081Z 
2025-12-04T12:33:36.6976218Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6976932Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda
2025-12-04T12:33:36.6977506Z 
2025-12-04T12:33:36.6977672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6978080Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6978363Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.6978625Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.6978989Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.6979370Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.6979624Z graph_break []
2025-12-04T12:33:36.6979828Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)]
2025-12-04T12:33:36.6980147Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6980424Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.6980638Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.6980985Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.6981353Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.6981602Z graph_break []
2025-12-04T12:33:36.6981806Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)]
2025-12-04T12:33:36.6982117Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.6982388Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.6982601Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.6982956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.6983334Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.6983574Z graph_break []
2025-12-04T12:33:36.6983769Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)]
2025-12-04T12:33:36.6984341Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml -
2025-12-04T12:33:36.6984917Z =========================== short test summary info ============================
2025-12-04T12:33:36.6986329Z FAILED [0.5015s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 742391808 and is now 767557632.
2025-12-04T12:33:36.6987624Z 
2025-12-04T12:33:36.6987755Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.6988411Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda
2025-12-04T12:33:36.6988958Z 
2025-12-04T12:33:36.6989119Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.6989467Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:33:36.6989793Z ======= 1 failed, 45 passed, 9 skipped, 23 deselected, 2 rerun in 34.64s =======
2025-12-04T12:33:36.6990077Z Got exit code 1
2025-12-04T12:33:36.6990242Z Retrying single test...
2025-12-04T12:33:36.6990639Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml
2025-12-04T12:33:36.6991093Z ============================= test session starts ==============================
2025-12-04T12:33:36.6991479Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:36.6991831Z cachedir: .pytest_cache
2025-12-04T12:33:36.6992306Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:36.6992804Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:33:36.6993021Z configfile: pytest.ini
2025-12-04T12:33:36.6993475Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:33:36.6994097Z collecting ... collected 188 items / 187 deselected / 1 selected
2025-12-04T12:33:36.6994807Z stepcurrent: skipping 77 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda
2025-12-04T12:33:36.6995464Z Running 1 items in this shard
2025-12-04T12:33:36.6995592Z 
2025-12-04T12:33:36.6996223Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda W1204 12:33:04.450000 170395 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:33:36.6997021Z ('RERUN', {'yellow': True}) [1.7100s] [100%]
2025-12-04T12:33:36.6997558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2665s] [100%]
2025-12-04T12:33:36.6998010Z 
2025-12-04T12:33:36.6998102Z ==================================== RERUNS ====================================
2025-12-04T12:33:36.6998547Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _
2025-12-04T12:33:36.6998977Z Traceback (most recent call last):
2025-12-04T12:33:36.6999430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.6999892Z     method(*args, **kwargs)
2025-12-04T12:33:36.7000352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7000800Z     method(*args, **kwargs)
2025-12-04T12:33:36.7001227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7001669Z     with policy():
2025-12-04T12:33:36.7002073Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7002520Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7003417Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 230686720 and is now 281018368.
2025-12-04T12:33:36.7004280Z 
2025-12-04T12:33:36.7004416Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7005076Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda
2025-12-04T12:33:36.7005623Z 
2025-12-04T12:33:36.7005784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7006160Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7006438Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7006658Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7006943Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7007315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7007632Z graph_break []
2025-12-04T12:33:36.7007841Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)]
2025-12-04T12:33:36.7008457Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml -
2025-12-04T12:33:36.7009072Z ================== 1 passed, 187 deselected, 1 rerun in 2.02s ==================
2025-12-04T12:33:36.7009327Z Got exit code 0
2025-12-04T12:33:36.7009571Z Test succeeded in new process, continuing with the rest of the tests
2025-12-04T12:33:36.7010101Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml
2025-12-04T12:33:36.7010582Z ============================= test session starts ==============================
2025-12-04T12:33:36.7010979Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:33:36.7011328Z cachedir: .pytest_cache
2025-12-04T12:33:36.7011742Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:33:36.7012190Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:33:36.7012400Z configfile: pytest.ini
2025-12-04T12:33:36.7012852Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:33:36.7013403Z collecting ... collected 188 items / 78 deselected / 110 selected
2025-12-04T12:33:36.7013701Z stepcurrent: skipping 78 already run items.
2025-12-04T12:33:36.7013933Z Running 110 items in this shard
2025-12-04T12:33:36.7014059Z 
2025-12-04T12:33:36.7014707Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda W1204 12:33:13.688000 170576 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T12:33:36.7015506Z ('RERUN', {'yellow': True}) [1.6993s] [  0%]
2025-12-04T12:33:36.7016054Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2742s] [  0%]
2025-12-04T12:33:36.7016880Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2625s] [  1%]
2025-12-04T12:33:36.7017896Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2683s] [  2%]
2025-12-04T12:33:36.7018780Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2725s] [  2%]
2025-12-04T12:33:36.7019587Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2803s] [  3%]
2025-12-04T12:33:36.7020445Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2513s] [  4%]
2025-12-04T12:33:36.7021311Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2483s] [  4%]
2025-12-04T12:33:36.7022120Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2520s] [  5%]
2025-12-04T12:33:36.7022922Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2490s] [  6%]
2025-12-04T12:33:36.7023723Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2495s] [  7%]
2025-12-04T12:33:36.7024523Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2489s] [  8%]
2025-12-04T12:33:36.7025322Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2494s] [  9%]
2025-12-04T12:33:36.7026255Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2485s] [ 10%]
2025-12-04T12:33:36.7027115Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2519s] [ 10%]
2025-12-04T12:33:36.7028076Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2496s] [ 10%]
2025-12-04T12:33:36.7028887Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2508s] [ 11%]
2025-12-04T12:33:36.7029693Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2491s] [ 12%]
2025-12-04T12:33:36.7030548Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2504s] [ 13%]
2025-12-04T12:33:36.7031405Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2510s] [ 13%]
2025-12-04T12:33:36.7032209Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2500s] [ 14%]
2025-12-04T12:33:36.7033060Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2503s] [ 15%]
2025-12-04T12:33:36.7033915Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2517s] [ 15%]
2025-12-04T12:33:36.7034713Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2529s] [ 16%]
2025-12-04T12:33:36.7035525Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2506s] [ 17%]
2025-12-04T12:33:36.7036317Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2515s] [ 18%]
2025-12-04T12:33:36.7037123Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2511s] [ 19%]
2025-12-04T12:33:36.7037923Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2518s] [ 20%]
2025-12-04T12:33:36.7038713Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2512s] [ 20%]
2025-12-04T12:33:36.7039499Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2530s] [ 21%]
2025-12-04T12:33:36.7040345Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2530s] [ 22%]
2025-12-04T12:33:36.7041146Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2514s] [ 23%]
2025-12-04T12:33:36.7041938Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2512s] [ 24%]
2025-12-04T12:33:36.7042726Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2523s] [ 25%]
2025-12-04T12:33:36.7043555Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2520s] [ 26%]
2025-12-04T12:33:36.7044491Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2560s] [ 27%]
2025-12-04T12:33:36.7045446Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2503s] [ 27%]
2025-12-04T12:33:36.7046423Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2506s] [ 28%]
2025-12-04T12:33:36.7047312Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2552s] [ 29%]
2025-12-04T12:33:36.7048189Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2542s] [ 30%]
2025-12-04T12:33:36.7049082Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2542s] [ 30%]
2025-12-04T12:33:36.7049941Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2505s] [ 31%]
2025-12-04T12:33:36.7050791Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2541s] [ 32%]
2025-12-04T12:33:36.7051634Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2530s] [ 33%]
2025-12-04T12:33:36.7052481Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2530s] [ 34%]
2025-12-04T12:33:36.7053343Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2510s] [ 35%]
2025-12-04T12:33:36.7054196Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda PASSED [0.2539s] [ 36%]
2025-12-04T12:33:36.7055044Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda PASSED [0.2543s] [ 37%]
2025-12-04T12:33:36.7055954Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 38%]
2025-12-04T12:33:36.7056932Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 39%]
2025-12-04T12:33:36.7057907Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 40%]
2025-12-04T12:33:36.7058850Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 40%]
2025-12-04T12:33:36.7059606Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda PASSED [0.6188s] [ 41%]
2025-12-04T12:33:36.7060305Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2562s] [ 42%]
2025-12-04T12:33:36.7061157Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2538s] [ 43%]
2025-12-04T12:33:36.7062049Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2534s] [ 44%]
2025-12-04T12:33:36.7062917Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2547s] [ 45%]
2025-12-04T12:33:36.7063778Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2539s] [ 46%]
2025-12-04T12:33:36.7064642Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2547s] [ 47%]
2025-12-04T12:33:36.7065480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2539s] [ 48%]
2025-12-04T12:33:36.7066314Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2540s] [ 49%]
2025-12-04T12:33:36.7067132Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2548s] [ 50%]
2025-12-04T12:33:36.7067951Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2549s] [ 50%]
2025-12-04T12:33:36.7068775Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2541s] [ 51%]
2025-12-04T12:33:36.7069587Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2574s] [ 52%]
2025-12-04T12:33:36.7070409Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2550s] [ 53%]
2025-12-04T12:33:36.7071242Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2557s] [ 54%]
2025-12-04T12:33:36.7072069Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2556s] [ 55%]
2025-12-04T12:33:36.7072888Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2559s] [ 56%]
2025-12-04T12:33:36.7073714Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2553s] [ 57%]
2025-12-04T12:33:36.7074545Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2559s] [ 58%]
2025-12-04T12:33:36.7075376Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2565s] [ 59%]
2025-12-04T12:33:36.7076204Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2570s] [ 60%]
2025-12-04T12:33:36.7077026Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2568s] [ 60%]
2025-12-04T12:33:36.7077841Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2561s] [ 61%]
2025-12-04T12:33:36.7078647Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2565s] [ 62%]
2025-12-04T12:33:36.7079459Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2568s] [ 63%]
2025-12-04T12:33:36.7080364Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda PASSED [0.2573s] [ 64%]
2025-12-04T12:33:36.7081251Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda PASSED [0.2578s] [ 65%]
2025-12-04T12:33:36.7082102Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda PASSED [0.2571s] [ 66%]
2025-12-04T12:33:36.7082961Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda PASSED [0.2570s] [ 67%]
2025-12-04T12:33:36.7083773Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda PASSED [0.2571s] [ 68%]
2025-12-04T12:33:36.7084582Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda PASSED [0.2577s] [ 69%]
2025-12-04T12:33:36.7085649Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0032s] (XPU does not support use_fast_accum=True for now) [ 70%]
2025-12-04T12:33:36.7086927Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0030s] (XPU does not support use_fast_accum=True for now) [ 70%]
2025-12-04T12:33:36.7088201Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0027s] (XPU does not support use_fast_accum=True for now) [ 71%]
2025-12-04T12:33:36.7089469Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0026s] (XPU does not support use_fast_accum=True for now) [ 72%]
2025-12-04T12:33:36.7090718Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0027s] (XPU does not support use_fast_accum=True for now) [ 73%]
2025-12-04T12:33:36.7091954Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 74%]
2025-12-04T12:33:36.7093178Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 75%]
2025-12-04T12:33:36.7094408Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0028s] (XPU does not support use_fast_accum=True for now) [ 76%]
2025-12-04T12:33:36.7095640Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 77%]
2025-12-04T12:33:36.7096870Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 78%]
2025-12-04T12:33:36.7098100Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 79%]
2025-12-04T12:33:36.7099368Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0027s] (XPU does not support use_fast_accum=True for now) [ 80%]
2025-12-04T12:33:36.7100653Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 80%]
2025-12-04T12:33:36.7101988Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 81%]
2025-12-04T12:33:36.7103253Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (bias is not supported when output dtype is float32) [ 82%]
2025-12-04T12:33:36.7104524Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0023s] (bias is not supported when output dtype is float32) [ 83%]
2025-12-04T12:33:36.7105771Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 84%]
2025-12-04T12:33:36.7106997Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 85%]
2025-12-04T12:33:36.7108222Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0026s] (bias is not supported when output dtype is float32) [ 86%]
2025-12-04T12:33:36.7109448Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0023s] (bias is not supported when output dtype is float32) [ 87%]
2025-12-04T12:33:36.7110668Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 88%]
2025-12-04T12:33:36.7111907Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0025s] (XPU does not support use_fast_accum=True for now) [ 89%]
2025-12-04T12:33:36.7113129Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0023s] (bias is not supported when output dtype is float32) [ 90%]
2025-12-04T12:33:36.7114348Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0026s] (bias is not supported when output dtype is float32) [ 90%]
2025-12-04T12:33:36.7115526Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 91%]
2025-12-04T12:33:36.7116628Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 92%]
2025-12-04T12:33:36.7117945Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 93%]
2025-12-04T12:33:36.7119071Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 94%]
2025-12-04T12:33:36.7120327Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32 SKIPPED [0.0004s] (Need device-side TMA support in Triton) [ 95%]
2025-12-04T12:33:36.7121412Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 96%]
2025-12-04T12:33:36.7122467Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 97%]
2025-12-04T12:33:36.7123502Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32 SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 98%]
2025-12-04T12:33:36.7124641Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] failed while attempting to run meta for aten._scaled_mm.default
2025-12-04T12:33:36.7125563Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] Traceback (most recent call last):
2025-12-04T12:33:36.7126433Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl
2025-12-04T12:33:36.7127272Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     r = func(*args, **kwargs)
2025-12-04T12:33:36.7128026Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T12:33:36.7128804Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     return self._op(*args, **kwargs)
2025-12-04T12:33:36.7129651Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6528, in meta_scaled_mm
2025-12-04T12:33:36.7130487Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     return _check_scaled_mm_sizes(
2025-12-04T12:33:36.7131342Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6384, in _check_scaled_mm_sizes
2025-12-04T12:33:36.7132153Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     torch._check(
2025-12-04T12:33:36.7132903Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1734, in _check
2025-12-04T12:33:36.7133804Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     _check_with(RuntimeError, cond, message)  # pyrefly: ignore [bad-argument-type]
2025-12-04T12:33:36.7134718Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1716, in _check_with
2025-12-04T12:33:36.7135529Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     raise error_type(message_evaluated)
2025-12-04T12:33:36.7136311Z E1204 12:33:34.808000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] RuntimeError: Expected self.size(1) to be divisible by 16, but got self.size(1)=15
2025-12-04T12:33:36.7136878Z PASSED [0.2020s] [ 99%]
2025-12-04T12:33:36.7137607Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] failed while attempting to run meta for aten._scaled_mm.default
2025-12-04T12:33:36.7138645Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] Traceback (most recent call last):
2025-12-04T12:33:36.7139496Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl
2025-12-04T12:33:36.7140329Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     r = func(*args, **kwargs)
2025-12-04T12:33:36.7141090Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T12:33:36.7141865Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     return self._op(*args, **kwargs)
2025-12-04T12:33:36.7142700Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6528, in meta_scaled_mm
2025-12-04T12:33:36.7143527Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     return _check_scaled_mm_sizes(
2025-12-04T12:33:36.7144386Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_meta_registrations.py", line 6498, in _check_scaled_mm_sizes
2025-12-04T12:33:36.7145209Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     torch._check(
2025-12-04T12:33:36.7145949Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1734, in _check
2025-12-04T12:33:36.7146843Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     _check_with(RuntimeError, cond, message)  # pyrefly: ignore [bad-argument-type]
2025-12-04T12:33:36.7147759Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 1716, in _check_with
2025-12-04T12:33:36.7148560Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0]     raise error_type(message_evaluated)
2025-12-04T12:33:36.7150098Z E1204 12:33:35.012000 170576 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] RuntimeError: Invalid scaling configuration. For tensorwise scaling, both scales should be scalar. For rowwise scaling, scale_a should be (233, 1), scale_b should be (1, 128). For (BlockWise1x128, BlockWise128x128), scale_a should be (233, 1), scale_b should be (1, 1). For (BlockWise1x128, BlockWise1x128), scale_a should be (233, 1), scale_b should be (1, 128). Got scale_a.size()=(1, 128) and scale_b.size()=(233, 1)
2025-12-04T12:33:36.7151420Z PASSED [0.1924s] [100%]
2025-12-04T12:33:36.7151527Z 
2025-12-04T12:33:36.7151620Z ==================================== RERUNS ====================================
2025-12-04T12:33:36.7152073Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _
2025-12-04T12:33:36.7152508Z Traceback (most recent call last):
2025-12-04T12:33:36.7152960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7153402Z     method(*args, **kwargs)
2025-12-04T12:33:36.7153861Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7154349Z     method(*args, **kwargs)
2025-12-04T12:33:36.7154759Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7155233Z     with policy():
2025-12-04T12:33:36.7155630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7156109Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7157025Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 230686720 and is now 278921216.
2025-12-04T12:33:36.7157890Z 
2025-12-04T12:33:36.7158022Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7158694Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda
2025-12-04T12:33:36.7159234Z 
2025-12-04T12:33:36.7159399Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7159772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7160089Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7160330Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7160619Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7160987Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7161310Z graph_break []
2025-12-04T12:33:36.7161531Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)]
2025-12-04T12:33:36.7161999Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _
2025-12-04T12:33:36.7162427Z Traceback (most recent call last):
2025-12-04T12:33:36.7162870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7163324Z     method(*args, **kwargs)
2025-12-04T12:33:36.7163737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7164174Z     method(*args, **kwargs)
2025-12-04T12:33:36.7164592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7165027Z     with policy():
2025-12-04T12:33:36.7165426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7165869Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7166765Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368.
2025-12-04T12:33:36.7167615Z 
2025-12-04T12:33:36.7167757Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7168222Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda
2025-12-04T12:33:36.7168227Z 
2025-12-04T12:33:36.7168391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7168530Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7168600Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7168701Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7168974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7169092Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7169157Z graph_break []
2025-12-04T12:33:36.7169301Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)]
2025-12-04T12:33:36.7169577Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _
2025-12-04T12:33:36.7169691Z Traceback (most recent call last):
2025-12-04T12:33:36.7170000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7170081Z     method(*args, **kwargs)
2025-12-04T12:33:36.7170380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7170445Z     method(*args, **kwargs)
2025-12-04T12:33:36.7170742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7170812Z     with policy():
2025-12-04T12:33:36.7171110Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7171189Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7171975Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368.
2025-12-04T12:33:36.7171980Z 
2025-12-04T12:33:36.7172112Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7172573Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda
2025-12-04T12:33:36.7172578Z 
2025-12-04T12:33:36.7172747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7172877Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7172948Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7173050Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7173239Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7173354Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7173422Z graph_break []
2025-12-04T12:33:36.7173530Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)]
2025-12-04T12:33:36.7173813Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _
2025-12-04T12:33:36.7173887Z Traceback (most recent call last):
2025-12-04T12:33:36.7174205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7174280Z     method(*args, **kwargs)
2025-12-04T12:33:36.7174578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7174652Z     method(*args, **kwargs)
2025-12-04T12:33:36.7174952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7175021Z     with policy():
2025-12-04T12:33:36.7175338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7175410Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7176266Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368.
2025-12-04T12:33:36.7176338Z 
2025-12-04T12:33:36.7176471Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7176923Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda
2025-12-04T12:33:36.7176962Z 
2025-12-04T12:33:36.7177164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7177295Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7177363Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7177464Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7177643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7177760Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7177824Z graph_break []
2025-12-04T12:33:36.7177926Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)]
2025-12-04T12:33:36.7178204Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _
2025-12-04T12:33:36.7178277Z Traceback (most recent call last):
2025-12-04T12:33:36.7178578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7178661Z     method(*args, **kwargs)
2025-12-04T12:33:36.7178958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7179025Z     method(*args, **kwargs)
2025-12-04T12:33:36.7179317Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7179376Z     with policy():
2025-12-04T12:33:36.7179676Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7179745Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7180523Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368.
2025-12-04T12:33:36.7180529Z 
2025-12-04T12:33:36.7180655Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7181106Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda
2025-12-04T12:33:36.7181118Z 
2025-12-04T12:33:36.7181278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7181406Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7181491Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7181587Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7181768Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7181885Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7181945Z graph_break []
2025-12-04T12:33:36.7182047Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)]
2025-12-04T12:33:36.7182322Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _
2025-12-04T12:33:36.7182393Z Traceback (most recent call last):
2025-12-04T12:33:36.7182700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7182763Z     method(*args, **kwargs)
2025-12-04T12:33:36.7183108Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7183213Z     method(*args, **kwargs)
2025-12-04T12:33:36.7183508Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7183602Z     with policy():
2025-12-04T12:33:36.7183901Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7184000Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7184780Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 281018368.
2025-12-04T12:33:36.7184784Z 
2025-12-04T12:33:36.7184907Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7185371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda
2025-12-04T12:33:36.7185375Z 
2025-12-04T12:33:36.7185533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7185663Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7185751Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7185847Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7186024Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7186149Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7186209Z graph_break []
2025-12-04T12:33:36.7186313Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)]
2025-12-04T12:33:36.7186625Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _
2025-12-04T12:33:36.7186700Z Traceback (most recent call last):
2025-12-04T12:33:36.7187004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7187070Z     method(*args, **kwargs)
2025-12-04T12:33:36.7187373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper
2025-12-04T12:33:36.7187446Z     method(*args, **kwargs)
2025-12-04T12:33:36.7187740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper
2025-12-04T12:33:36.7187805Z     with policy():
2025-12-04T12:33:36.7188102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__
2025-12-04T12:33:36.7188169Z     raise RuntimeError(msg)
2025-12-04T12:33:36.7189004Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 260046848 and is now 299892736.
2025-12-04T12:33:36.7189014Z 
2025-12-04T12:33:36.7189145Z To execute this test, run the following from the base repo dir:
2025-12-04T12:33:36.7189640Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda
2025-12-04T12:33:36.7189644Z 
2025-12-04T12:33:36.7189800Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:33:36.7189931Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:33:36.7190000Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:33:36.7190136Z stats [('calls_captured', 1), ('unique_graphs', 1)]
2025-12-04T12:33:36.7190371Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T12:33:36.7190481Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)]
2025-12-04T12:33:36.7190541Z graph_break []
2025-12-04T12:33:36.7190716Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)]
2025-12-04T12:33:36.7191141Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml -
2025-12-04T12:33:36.7191284Z =========== 74 passed, 36 skipped, 78 deselected, 7 rerun in 22.59s ============
2025-12-04T12:33:36.7191823Z The following tests failed and then succeeded when run in a new process['test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda']
2025-12-04T12:33:36.7192149Z The following tests failed consistently: ['test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16']
2025-12-04T12:33:36.7192155Z 
2025-12-04T12:33:36.7192432Z FINISHED PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_041887d0b8d7fee8_.log)
2025-12-04T12:33:36.7192436Z 
2025-12-04T12:33:36.7192617Z Finished inductor/test_fp8 1/1 ... [2025-12-04 12:33:36.640779][13256.569074556], took 2.35min
2025-12-04T12:33:36.7193048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml
2025-12-04T12:33:36.7844090Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml
2025-12-04T12:33:36.8111449Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml
2025-12-04T12:33:36.8409105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml
2025-12-04T12:33:36.8660677Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml
2025-12-04T12:33:36.8891342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml
2025-12-04T12:33:37.4412432Z Uploading logs for 57116084862 to S3
2025-12-04T12:33:37.6024080Z Uploading artifacts took 0.68 seconds
2025-12-04T12:33:37.6024449Z inductor/test_fp8 1/1 failed!
2025-12-04T12:33:37.6027334Z Running inductor/test_flex_flash 1/1 ... [2025-12-04 12:33:37.602523][13257.530819143]
2025-12-04T12:33:37.6027712Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:33:37.6031524Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_flex_flash.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:37.602914]
2025-12-04T12:33:41.7745606Z 
2025-12-04T12:33:41.7747190Z inductor/test_flex_flash 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_flex_flash_1.1_143acf3e8eed598e_.log
2025-12-04T12:33:41.7770389Z Running 58 items in this shard: test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_kernel_called_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_kernel_called_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_mask_mod_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_mask_mod_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_capture_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_capture_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_backward_rejects_score_mod_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_basic_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_basic_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_block_mask_with_score_mod_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_block_mask_with_score_mod_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_kernel_called_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_kernel_called_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_dual_buffers_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_dual_buffers_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_view_buffer_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_mask_mod_with_view_buffer_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_score_mod_with_many_buffer_indexing_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_score_mod_with_many_buffer_indexing_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_127_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_127_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_255_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_255_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_383_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_383_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_511_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_unfriendly_seqlen_with_causal_seq_len_511_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_alibi_learned_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_alibi_learned_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_head_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_batch_head_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_block_mask_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_block_mask_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_doc_mask_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_doc_mask_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_dual_buffer_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_dual_buffer_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_head_scale_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_head_scale_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_mask_mod_buffer_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_mask_mod_buffer_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_pos_bias_table_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_pos_bias_table_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_and_mask_buffers_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_and_mask_buffers_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_causal_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_causal_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_rel_bias_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_rel_bias_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_times_two_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_mod_times_two_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_view_buffer_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_attention_with_score_view_buffer_cuda_float16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_impl_error_with_requires_grad_cuda_bfloat16, test/inductor/test_flex_flash.py::TestFlexFlashCUDA::test_flash_impl_error_with_requires_grad_cuda_float16
2025-12-04T12:33:41.7788491Z 
2025-12-04T12:33:41.7788706Z Finished inductor/test_flex_flash 1/1 ... [2025-12-04 12:33:41.774326][13261.70261849], took 0.07min
2025-12-04T12:33:41.7998913Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_flex_flash/inductor.test_flex_flash-594b02277dbafddb.xml
2025-12-04T12:33:41.8314958Z Running inductor/test_segmented_tree 1/1 ... [2025-12-04 12:33:41.831263][13261.759562402]
2025-12-04T12:33:41.8315445Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:33:41.8318226Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_segmented_tree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:41.831546]
2025-12-04T12:33:45.3021862Z 
2025-12-04T12:33:45.3022814Z inductor/test_segmented_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_segmented_tree_1.1_84df512657bb7938_.log
2025-12-04T12:33:45.3027026Z Running 12 items in this shard: test/inductor/test_segmented_tree.py::TestSegmentedTree::test_basic_construction, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_boundary_conditions, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_empty_array, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_full_array_range, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_invalid_ranges, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_max_query_matches_naive, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_multiple_operations, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_out_of_bounds, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_overlapping_updates, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_range_update, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_sequential_updates_and_queries, test/inductor/test_segmented_tree.py::TestSegmentedTree::test_single_element_ranges
2025-12-04T12:33:45.3030075Z 
2025-12-04T12:33:45.3030314Z Finished inductor/test_segmented_tree 1/1 ... [2025-12-04 12:33:45.301915][13265.230210049], took 0.06min
2025-12-04T12:33:45.3283997Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_segmented_tree/inductor.test_segmented_tree-3edcd06141f9e439.xml
2025-12-04T12:33:45.3583988Z Running inductor/test_kernel_optimization 1/1 ... [2025-12-04 12:33:45.358169][13265.286467138]
2025-12-04T12:33:45.3584626Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:33:45.3587701Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_kernel_optimization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:45.358490]
2025-12-04T12:33:56.9926748Z 
2025-12-04T12:33:56.9927686Z inductor/test_kernel_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_kernel_optimization_1.1_55e099cc3f4c3e00_.log
2025-12-04T12:33:56.9928836Z Running 1 items in this shard: test/inductor/test_kernel_optimization.py::TestKernelOptimization::test_einsum_to_pointwise
2025-12-04T12:33:56.9929320Z 
2025-12-04T12:33:56.9929631Z Finished inductor/test_kernel_optimization 1/1 ... [2025-12-04 12:33:56.992404][13276.92069971], took 0.19min
2025-12-04T12:33:57.0180820Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_optimization/inductor.test_kernel_optimization-b711ca038cf9dded.xml
2025-12-04T12:33:57.1062450Z Running inductor/test_metrics 1/1 ... [2025-12-04 12:33:57.105973][13277.034271488]
2025-12-04T12:33:57.1062996Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:33:57.1065692Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_metrics.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:33:57.106299]
2025-12-04T12:34:06.2366071Z 
2025-12-04T12:34:06.2367765Z inductor/test_metrics 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_metrics_1.1_8516a4d7ea2a79fb_.log
2025-12-04T12:34:06.2370006Z Running 6 items in this shard: test/inductor/test_metrics.py::TestMetrics::test_atomic_add, test/inductor/test_metrics.py::TestMetrics::test_count_args, test/inductor/test_metrics.py::TestMetrics::test_count_pattern, test/inductor/test_metrics.py::TestMetrics::test_kernel_args_num_gb, test/inductor/test_metrics.py::TestMetrics::test_parse_proper_kernel_fn_code, test/inductor/test_metrics.py::TestMetrics::test_parse_reduction_hint
2025-12-04T12:34:06.2371557Z 
2025-12-04T12:34:06.2371851Z Finished inductor/test_metrics 1/1 ... [2025-12-04 12:34:06.236186][13286.164474624], took 0.15min
2025-12-04T12:34:06.2622687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_metrics/inductor.test_metrics-b803ef0c4a9491e7.xml
2025-12-04T12:34:06.3352535Z Running export/test_unflatten_training_ir 1/1 ... [2025-12-04 12:34:06.334984][13286.263282146]
2025-12-04T12:34:06.3353067Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:34:06.3355612Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten_training_ir.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:34:06.335294]
2025-12-04T12:34:19.0719529Z 
2025-12-04T12:34:19.0721141Z export/test_unflatten_training_ir 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_training_ir_1.1_381cd1c16fe9e11b_.log
2025-12-04T12:34:19.0734199Z Running 29 items in this shard: test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_assert_tensor_metadata_stack_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_attr_as_submod_input_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_dedup_sym_size_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_double_nested_submodule_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_duplicate_placeholder_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_fx_trace_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_nested_leaf_non_strict_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_placeholder_and_get_attr_ordering_after_unflattened_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_simple_alias_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_buffer_mutation_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_constant_obj_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_constant_tensor_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_container_type_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_eager_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_empty_branch_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_nested_access_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_nested_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_none_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_param_list_dict_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_preserve_signature_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_preserve_with_unused_input_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_requires_grad_param_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_root_module_type_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_shared_submodule_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_skipped_call_module_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_submodule_ordering_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_with_inplace_compile_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflatten_wrong_input_training_ir, test/export/test_unflatten_training_ir.py::TrainingIRUnflattenTestUnflatten::test_unflattened_module_nodes_has_meta_val_training_ir
2025-12-04T12:34:19.0744643Z 
2025-12-04T12:34:19.0744900Z Finished export/test_unflatten_training_ir 1/1 ... [2025-12-04 12:34:19.071772][13299.000066202], took 0.21min
2025-12-04T12:34:19.0979105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_unflatten_training_ir/export.test_unflatten_training_ir-0b06d16f89271e02.xml
2025-12-04T12:34:19.1755263Z Running inductor/test_triton_kernels 1/1 ... [2025-12-04 12:34:19.175254][13299.103552035]
2025-12-04T12:34:19.1756023Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:34:19.1758421Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:34:19.175551]
2025-12-04T12:36:34.8784508Z 
2025-12-04T12:36:34.8787460Z inductor/test_triton_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_kernels_1.1_cd88835deb58b6d4_.log
2025-12-04T12:36:34.8908368Z Running 366 items in this shard: test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_False_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_False, test/inductor/test_triton_kernels.py::KernelTests::test_constexpr_dynamic_shapes_wrapped_True_autotune_True, test/inductor/test_triton_kernels.py::KernelTests::test_i64_input, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_inline_asm_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_double, test/inductor/test_triton_kernels.py::KernelTests::test_kernel_with_docstring_quotes_single, test/inductor/test_triton_kernels.py::KernelTests::test_layout_constraint_needs_fixed_stride_order, test/inductor/test_triton_kernels.py::KernelTests::test_no_nan_kernels, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_on_device_tma_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_capture_and_functionalize_dynamic_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_False_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_1d_dynamic_True_backend_inductor_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_False_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_aot_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_2d_dynamic_True_backend_eager_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_descriptor_dedup_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_False_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_False_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_new, test/inductor/test_triton_kernels.py::KernelTests::test_tma_graph_breaks_after_data_ptr_True_after_create_desc_True_tma_version_old, test/inductor/test_triton_kernels.py::KernelTests::test_triton_attrs_dict_equal_1_None_format, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_eager_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_0, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_2d_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3_tdlp_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_False_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_False_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_aot_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_eager_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_1, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_2, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_grad_True_dynamic_True_backend_inductor_grid_type_3, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_autotune_with_unsupported_args_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_caching_duplicate, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_constants, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dependancies, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_16_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_different_shapes_size_4_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_cpp_wrapper, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_dtype_view_cfg_normal, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_empty_autotune_config_dict_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_mm_kernels_do_not_change, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_emulate_precision_unaffected, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_0_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_arg_dump_launch_params_1_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_equal_to_1_float_arg_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_fallback, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float16, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float32, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_float64_constant_float64, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_functionalize, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_global_constexpr, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_higher_order_func, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inner_triton_function_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_inputs_buffer_reuse, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_matmul_tracking, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multi_kernel_grad_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_multiple_outputs_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_not_mark_dirty, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_mutation_type, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_False_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_native_grad_True_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_False_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_no_clones_grad_True_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_none_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_num_ctas_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_out_of_order, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_reinplace_inplaceable_pass, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_restore_value_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_slice_and_view_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_with_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_kwargs_without_autotune_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_special_params_autotune_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_strided_input_nonzero_offset, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_to_cpu, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_False, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_tracing_dynamic_True, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_triton_dtype_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_unbacked_shape_tensor_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_various_args, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_constexpr_function, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn0_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_grad_option_grad_fn1_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_imported_symbol_with_custom_name, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_kernel_param, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_False_backend_inductor, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_aot_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_eager, test/inductor/test_triton_kernels.py::KernelTests::test_triton_kernel_with_views_dynamic_True_backend_inductor, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_for_loop2, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_new_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_kernel_on_device_tma_old_api, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop, test/inductor/test_triton_kernels.py::MutationTests::test_add_nested_for_loop_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_argmax, test/inductor/test_triton_kernels.py::MutationTests::test_branch_with_multiple_yield_args, test/inductor/test_triton_kernels.py::MutationTests::test_cumsum, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_multi_return, test/inductor/test_triton_kernels.py::MutationTests::test_fn_call_one_return, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg, test/inductor/test_triton_kernels.py::MutationTests::test_for_loop_arg_2, test/inductor/test_triton_kernels.py::MutationTests::test_get_tma_stores, test/inductor/test_triton_kernels.py::MutationTests::test_labels, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_4_times_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_2d_autotuned, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_block_ptr, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_add_kernel_with_import, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_atomic_add_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_indirection_kernel1, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_false, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_inline_asm_kernel_is_pure_true, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_kernel_with_block_ptr_2d, test/inductor/test_triton_kernels.py::MutationTests::test_mutations_mul2_inplace_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_nested_cond_op_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel, test/inductor/test_triton_kernels.py::MutationTests::test_out_of_order_kernel_call, test/inductor/test_triton_kernels.py::MutationTests::test_reduce_sum, test/inductor/test_triton_kernels.py::MutationTests::test_triton_kernel_inference_mode, test/inductor/test_triton_kernels.py::MutationTests::test_while_loop, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_False_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_add_kernel_autotuned_True_dynamic_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_no_pre_or_post_hook_user_defined, test/inductor/test_triton_kernels.py::CustomOpTests::test_autotune_unbacked, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_meta, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_False_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_capture_triton_special_kwargs_dynamic_True_autotune_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_mutable_custom_op, test/inductor/test_triton_kernels.py::CustomOpTests::test_preserves_strides_variant_triton_kernel, test/inductor/test_triton_kernels.py::CustomOpTests::test_subclass, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_dynamic_grid_no_recompile, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_False_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_heuristic_non_strict_True_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_False_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_non_strict_True_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_aot_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_eager_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_prune_configs_by_recompile_backend_inductor_with_perf_model_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_aot_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_eager_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_False, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_kernel_reset_to_zero_backend_inductor_autotune_at_compile_time_True, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_aot_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_eager, test/inductor/test_triton_kernels.py::CustomOpTests::test_triton_single_autotune_backend_inductor, test/inductor/test_triton_kernels.py::CustomOpTests::test_wrap_triton_disabled_in_triton_op
2025-12-04T12:36:34.9024952Z 
2025-12-04T12:36:34.9025200Z Finished inductor/test_triton_kernels 1/1 ... [2025-12-04 12:36:34.878955][13434.807247143], took 2.26min
2025-12-04T12:36:34.9049780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-31757cb9ac5c1c41.xml
2025-12-04T12:36:35.0500053Z Running inductor/test_lookup_table 1/1 ... [2025-12-04 12:36:35.049751][13434.978050062]
2025-12-04T12:36:35.0500528Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:36:35.0503186Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_lookup_table.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:35.050050]
2025-12-04T12:36:40.3321926Z 
2025-12-04T12:36:40.3322853Z inductor/test_lookup_table 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_lookup_table_1.1_5735430f25b64d36_.log
2025-12-04T12:36:40.3323536Z 
2025-12-04T12:36:40.3323828Z Finished inductor/test_lookup_table 1/1 ... [2025-12-04 12:36:40.331958][13440.260255284], took 0.09min
2025-12-04T12:36:40.3578349Z Running inductor/test_cutedsl_template 1/1 ... [2025-12-04 12:36:40.357585][13440.285884993]
2025-12-04T12:36:40.3578851Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:36:40.3582464Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_template.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:40.357902]
2025-12-04T12:36:45.7814972Z 
2025-12-04T12:36:45.7817563Z inductor/test_cutedsl_template 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_template_1.1_aa2751f33400ad4f_.log
2025-12-04T12:36:45.7825077Z Running 13 items in this shard: test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cse_integration, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_add_e2e_autotune, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_cutedsl_op_overrides, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_defines, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_gen_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_get_output_hook, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_indented_buffer_usage, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_modification_subgraph, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_multiple_templates_unique_names, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_render_includes_imports, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_aliasing, test/inductor/test_cutedsl_template.py::TestCuteDSLTemplate::test_template_env_contains_hooks
2025-12-04T12:36:45.7828982Z 
2025-12-04T12:36:45.7829235Z Finished inductor/test_cutedsl_template 1/1 ... [2025-12-04 12:36:45.781098][13445.709393071], took 0.09min
2025-12-04T12:36:45.8070139Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-0fae1aaa4a2003eb.xml
2025-12-04T12:36:45.9004352Z Running inductor/test_benchmark_fusion 1/1 ... [2025-12-04 12:36:45.900153][13445.828452745]
2025-12-04T12:36:45.9004853Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:36:45.9007504Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmark_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:36:45.900462]
2025-12-04T12:37:05.1504208Z 
2025-12-04T12:37:05.1505998Z inductor/test_benchmark_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmark_fusion_1.1_e074574f7d298815_.log
2025-12-04T12:37:05.1513364Z Running 16 items in this shard: test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_avoid_register_spilling_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_foreach_kernel_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_register_spills_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_resnet18_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_softmax_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkFusionGpuTest::test_tield_kernel_fusion_cuda, test/inductor/test_benchmark_fusion.py::BenchmarkingTest::test_benchmark_on_non_zero_device, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_changed_layout, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_extern_code, test/inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionGpuTest::test_equivalent_template_code, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_avoid_register_spilling_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_foreach_kernel_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_register_spills_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_resnet18_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_softmax_cpu, test/inductor/test_benchmark_fusion.py::BenchmarkFusionCpuTest::test_tield_kernel_fusion_cpu
2025-12-04T12:37:05.1518396Z 
2025-12-04T12:37:05.1518736Z Finished inductor/test_benchmark_fusion 1/1 ... [2025-12-04 12:37:05.149956][13465.078248761], took 0.32min
2025-12-04T12:37:05.1763301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-bc092756daa3cb92.xml
2025-12-04T12:37:05.2611289Z Running export/test_serdes 1/1 ... [2025-12-04 12:37:05.260872][13465.189170985]
2025-12-04T12:37:05.2611725Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:37:05.2614385Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serdes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:37:05.261180]
2025-12-04T12:39:51.3497880Z 
2025-12-04T12:39:51.3499196Z export/test_serdes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serdes_1.1_e8663fe68509169e_.log
2025-12-04T12:39:51.3777908Z Running 880 items in this shard: test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_assume_static_by_default_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_constraints_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_maxsize_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_no_grad_param_inplace_serdes_strict, test/export/test_serdes.py::SerDesExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_constraints_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_maxsize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportTestExport::test__scaled_dot_product_flash_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_additional_inputs_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_annotate_on_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_args_type_checked_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_aten_lift_fresh_copy_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attention_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_attr_assignment_extra_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_constrain_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_baddbmm_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_fake_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_buffer_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_constructor_torch_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_capture_subclass_wrong_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ccode_python_mod_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cdist_forward_compute_mode_zero_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_check_specialized_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_checks_to_constrain_range_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cleanup_dynamic_markers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colin_unbacked_backed_vr_sub_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_colon_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_compiling_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_access_identical_symint_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_constant_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_branches_return_same_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_contains_unbacked_no_escape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_int_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_input_naming_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_no_user_inp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_dup_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_requires_grad_const_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constant_tensor_with_non_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_in_eager_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_constrain_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_constrain_size_with_various_cases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_conv_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_crop_like_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_cse_for_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_functionalize_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_op_preserve_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_custom_tag_metadata_re_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_batch_norm_functional_predispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_after_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_decomp_item_in_prim_before_decomposition_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_default_decomposition_core_cia_ops_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_integer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_out_of_order_simplified_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_derived_dim_repeat_derived_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_detect_leak_strict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_gpu_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_float_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_device_to_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_1_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_auto_and_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_divisibility_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_dynamic_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_range_violations_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dim_hint_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_disable_forced_specializations_ok_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_into_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_gather_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_reduce_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_all_to_all_single_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_distributed_reduce_scatter_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dont_duck_size_for_auto_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_double_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_aliasing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_list_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_checks_mutation_with_nan_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_fake_kernel_inference_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_draft_export_infers_fake_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_lr_shift_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_bounds_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_builder_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_dataclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_inferred_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_generic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_user_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_serdes_various_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_spec_with_pytree_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_dynamic_sym_round_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_ends_of_bounds_oblivious_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_enum_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_does_not_reference_eager_fallback_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_error_when_passing_mutating_primitive_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_exception_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_expand_copy_export_handles_implicit_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_api_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_as_backend_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_lifted_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_associative_scan_symbol_scandim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_aten_to_unflatten_subclass_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_symbool_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cond_warns_constant_pred_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_basic_pop_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_decomp_table_container_methods_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_op_lib_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_mutable_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_custom_triton_kernel_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_cyclic_reference_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomp_torture_case_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_decomps_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_dynamo_config_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_run_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_for_training_with_state_dict_hooks_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_default_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_keyword_only_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_pytree_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_keyword_pytree_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_func_with_var_postional_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_function_schema_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_graph_with_no_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_bug_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_input_mutation_static_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_leak_compile_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_linear_preserve_dynamic_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_max_onnx_reported_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_mod_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_at_aot_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_preserve_linear_but_not_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_rnn_variants_with_warning_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_scan_pytree_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_script_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_statically_known_true_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_then_compile_tensor_ctor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_autocast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_fake_tensor_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_complex_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_inline_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_set_grad_enabled_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_export_with_wrong_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_external_call_non_strict_real_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fake_weights_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_filter_traceback_frames_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_flex_attention_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_from_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_float_conversion_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_fqn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_from_node_metadata_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_full_on_scalar_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_function_holding_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hints_wrapper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_hoo_inline_users_issue_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_functional_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_if_post_autograd_op_preserved_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inductor_backend_inside_nonstrict_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_recursive_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_class_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_inline_script_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_int_shape_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_intermediate_shape_comp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_exporting_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_is_nonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_isnonzero_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_113041_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_157289_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_issue_161902_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_istft_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_invalid_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_keep_composite_ops_linear_convd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_kwargs_reorder_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_norm_unbacked_normalized_shape_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_layer_sharing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lazy_module_kwargs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_linear_conv_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_malformed_fqn_from_source_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_map_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mask_nonzero_static_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_masked_select_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_math_pow_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mismatched_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_mixed_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_dict_key_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_input_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_list_slice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_module_with_dict_container_inp_out_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_modules_access_for_deleted_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_more_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multidimensional_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multinomial_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_multiple_definitions_same_name_dim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_namedtuple_input_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_native_multi_attention_head_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_dynamic_shapes_spec_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_fake_tensor_leak_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_constant_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_init_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nested_module_with_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nn_module_stack_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_check_is_size_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_no_tensor_computation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_persistent_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_none_buffers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonstrict_retrace_preserves_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_2_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_nonzero_dynamic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_not_registered_parameter_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_operator_aten_tensor_mode_variant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_output_node_name_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pad_sequence_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_param_util_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_partial_patched_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_collisions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_naming_order_variadic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_placeholder_update_preserving_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_predispatch_grad_wrappers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_annotation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_requires_grad_placeholders_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_profiling_code_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_python_asserts_with_sym_int_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_pytree_register_nested_data_class_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_range_constraints_with_replacement_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_alias_dtype_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_bool_cast_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_for_max_op_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_real_tensor_size_mismatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_assert_max_upper_bound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_redundant_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_register_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_repeat_interleave_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_replaced_unbacked_bindings_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_reshape_view_helper_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retracable_ep_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_retrace_pre_autograd_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decomposition_supports_user_input_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prim_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_for_prm_str_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_runtime_assert_with_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sdpa_gqa_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sequential_slicing_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_example_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_as_side_effect_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_empty_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_set_grad_unflatten_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_setgrad_lifted_tensor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_shared_submodule_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_export_for_training_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_simple_unbacked_view_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_size_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_slice_nn_module_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_solver_unsupported_sympy_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_specialize_derived_dim_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_split_const_gm_with_lifted_constants_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_make_fx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_stack_trace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_primitives_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_shape_attribute_assignment_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_state_tensors_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_static_dim_constraints_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_context_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_const_metadata_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclass_nested_attr_access_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_nested_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_subclasses_parameterization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggest_torch_checks_with_regular_check_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_suggested_fixes_new_roots_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_float_operators_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_or_sym_and_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_sym_sqrt_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symbool_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symfloat_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_additional_inputs_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_basic_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_ranges_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_shapes_collection_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_input_specialization_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_item_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_output_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_symint_tensor_return_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tag_ac_export_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_attribute_zero_args_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_aten_to_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tensor_constant_with_wrapped_method_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_to_module_with_mutated_buffer_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tolist_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_check_eq_commutativity_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_torch_fn_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_trace_under_fake_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_train_eval_on_exported_preautograd_module_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_tril_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_triu_dynamic_diagonal_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_3d_matmul_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bincount_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_deferred_runtime_retrace_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_expand_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_infer_size_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_kth_value_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_linear_layer_norm_input_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_noncontig_lin_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_pad_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_scalar_constructor_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_forward_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_slice_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_stack_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_passthrough_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_to_cond_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unbacked_unsqueeze_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_asserts_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_closure_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_isinstance_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_dispatch_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_multiple_graphs_state_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_no_unroll_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_buf_8_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_const_preserving_3_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unflatten_random_dag_preserving_4_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_aliases_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_unused_constant_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_use_embedding_twice_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_user_input_and_buffer_mutation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_custom_autograd_function_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_vmap_to_assert_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_where_decomp_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_assert_separation_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_index_assertions_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_simple_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_while_loop_tensor_constant_idx_serdes_strict, test/export/test_serdes.py::SerDesExportTestExport::test_wrapper_module_serdes_strict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test__scaled_dot_product_flash_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_additional_inputs_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_annotate_on_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_args_type_checked_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_aten_lift_fresh_copy_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attention_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_attr_assignment_extra_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_constrain_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_baddbmm_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_fake_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_buffer_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_capture_subclass_wrong_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ccode_python_mod_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cdist_forward_compute_mode_zero_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_check_specialized_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_checks_to_constrain_range_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cleanup_dynamic_markers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_colon_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_compiling_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_access_identical_symint_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_constant_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_branches_return_same_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_int_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_input_naming_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_no_user_inp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_dup_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_requires_grad_const_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constant_tensor_with_non_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_in_eager_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_constrain_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_constrain_size_with_various_cases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_conv_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_crop_like_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_cse_for_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_functionalize_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_op_preserve_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_custom_tag_metadata_re_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_default_decomposition_core_cia_ops_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_integer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_derived_dim_repeat_derived_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_detect_leak_strict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_gpu_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_float_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_device_to_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_1_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_auto_and_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_divisibility_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_dynamic_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_range_violations_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dim_hint_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_disable_forced_specializations_ok_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_into_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_gather_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_reduce_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_all_to_all_single_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_double_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_aliasing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_list_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_draft_export_infers_fake_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_lr_shift_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_bounds_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_dataclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_serdes_various_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_shapes_wrapped_with_shape_guards_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_dynamic_sym_round_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_ends_of_bounds_oblivious_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_enum_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_exception_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_api_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_as_backend_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_symbool_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cond_warns_constant_pred_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_op_lib_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_custom_triton_kernel_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_cyclic_reference_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomp_torture_case_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_decomps_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_dynamo_config_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_run_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_default_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_keyword_only_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_pytree_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_func_with_var_postional_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_function_schema_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_graph_with_no_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_bug_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_input_mutation_static_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_leak_compile_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_max_onnx_reported_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_mod_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_rnn_variants_with_warning_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_scan_pytree_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_script_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_statically_known_true_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_then_compile_tensor_ctor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_autocast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_fake_tensor_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_complex_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_inline_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_set_grad_enabled_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_export_with_wrong_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_external_call_non_strict_real_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fake_weights_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_filter_traceback_frames_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_flex_attention_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_from_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_float_conversion_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_fqn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_from_node_metadata_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_full_on_scalar_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_function_holding_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hints_wrapper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_hoo_inline_users_issue_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_functional_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_if_post_autograd_op_preserved_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inductor_backend_inside_nonstrict_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_recursive_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_class_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_inline_script_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_int_shape_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_intermediate_shape_comp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_invalid_pytree_dynamo_graph_capture_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_exporting_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_is_nonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_isnonzero_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_113041_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_157289_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_issue_161902_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_istft_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_invalid_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_keep_composite_ops_linear_convd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_kwargs_reorder_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_layer_sharing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lazy_module_kwargs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_linear_conv_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_malformed_fqn_from_source_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_map_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mask_nonzero_static_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_masked_select_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_math_pow_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mismatched_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_mixed_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_dict_key_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_list_slice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_module_with_dict_container_inp_out_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_modules_access_for_deleted_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_more_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multidimensional_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multinomial_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_multiple_definitions_same_name_dim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_namedtuple_input_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_native_multi_attention_head_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_dynamic_shapes_spec_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_fake_tensor_leak_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_constant_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_init_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nested_module_with_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nn_module_stack_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_check_is_size_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_no_tensor_computation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_persistent_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_none_buffers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_2_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_nonzero_dynamic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_not_registered_parameter_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_output_node_name_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pad_sequence_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_param_util_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_partial_patched_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_collisions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_naming_order_variadic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_placeholder_update_preserving_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_predispatch_grad_wrappers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_annotation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_requires_grad_placeholders_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_profiling_code_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_python_asserts_with_sym_int_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_pytree_register_nested_data_class_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_range_constraints_with_replacement_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_bool_cast_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_for_max_op_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_real_tensor_size_mismatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_assert_max_upper_bound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_redundant_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_register_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_repeat_interleave_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_replaced_unbacked_bindings_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_reshape_view_helper_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retracable_ep_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_retrace_pre_autograd_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prim_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_for_prm_str_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_runtime_assert_with_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sdpa_gqa_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sequential_slicing_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_example_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_as_side_effect_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_empty_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_set_grad_unflatten_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_setgrad_lifted_tensor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_shared_submodule_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_export_for_training_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_simple_unbacked_view_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_size_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_slice_nn_module_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_solver_unsupported_sympy_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_specialize_derived_dim_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_make_fx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_stack_trace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_primitives_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_shape_attribute_assignment_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_state_tensors_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_static_dim_constraints_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_context_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_nested_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_subclasses_parameterization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_suggested_fixes_new_roots_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_float_operators_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_or_sym_and_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_sym_sqrt_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symbool_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symfloat_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_additional_inputs_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_basic_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_ranges_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_shapes_collection_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_input_specialization_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_item_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_output_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_symint_tensor_return_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tag_ac_export_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_attribute_zero_args_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_aten_to_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_to_module_with_mutated_buffer_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tolist_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_check_eq_commutativity_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_torch_fn_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_trace_under_fake_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_tril_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_triu_dynamic_diagonal_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_3d_matmul_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bincount_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_expand_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_infer_size_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_kth_value_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_noncontig_lin_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_pad_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_scalar_constructor_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_forward_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_slice_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_stack_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_passthrough_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_to_cond_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unbacked_unsqueeze_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_asserts_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_closure_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_isinstance_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_multiple_graphs_state_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_no_unroll_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_buf_8_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_aliases_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_unused_constant_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_uplift_common_custom_meta_with_multiple_calls_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_use_embedding_twice_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_user_input_and_buffer_mutation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_custom_autograd_function_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_vmap_to_assert_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_where_decomp_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_assert_separation_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_index_assertions_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_simple_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_while_loop_tensor_constant_idx_serdes_nonstrict, test/export/test_serdes.py::SerDesExportNonStrictTestExport::test_wrapper_module_serdes_nonstrict
2025-12-04T12:39:51.4044117Z 
2025-12-04T12:39:51.4044446Z Finished export/test_serdes 1/1 ... [2025-12-04 12:39:51.351451][13631.27974505], took 2.77min
2025-12-04T12:39:51.4045239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serdes/export.test_serdes-707c3510a208c1b4.xml
2025-12-04T12:39:51.4860780Z Running inductor/test_control_deps 1/1 ... [2025-12-04 12:39:51.485816][13631.414113475]
2025-12-04T12:39:51.4861524Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:39:51.4864089Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_deps.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:39:51.486104]
2025-12-04T12:40:00.5667207Z 
2025-12-04T12:40:00.5668696Z inductor/test_control_deps 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_deps_1.1_8b7ff51ad1c7850a_.log
2025-12-04T12:40:00.5671085Z Running 1 items in this shard: test/inductor/test_control_deps.py::TestControlDeps::test_control_deps_prevents_fusion
2025-12-04T12:40:00.5672124Z 
2025-12-04T12:40:00.5672758Z Finished inductor/test_control_deps 1/1 ... [2025-12-04 12:40:00.566303][13640.494597899], took 0.15min
2025-12-04T12:40:00.5936950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-c5f7586c63bf6a73.xml
2025-12-04T12:40:00.6625995Z Running inductor/test_benchmarking 1/1 ... [2025-12-04 12:40:00.662335][13640.590634834]
2025-12-04T12:40:00.6626481Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:40:00.6629033Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_benchmarking.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:00.662641]
2025-12-04T12:40:07.2383295Z 
2025-12-04T12:40:07.2384192Z inductor/test_benchmarking 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_benchmarking_1.1_829d05e5f8b311c4_.log
2025-12-04T12:40:07.2389454Z Running 12 items in this shard: test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_cpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_gpu_smoke_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_many_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls0, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_safely_infers_device_no_devices_benchmarker_cls1, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls0_device_cuda, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cpu, test/inductor/test_benchmarking.py::TestBenchmarker::test_benchmark_smoke_benchmarker_cls1_device_cuda
2025-12-04T12:40:07.2393689Z 
2025-12-04T12:40:07.2393945Z Finished inductor/test_benchmarking 1/1 ... [2025-12-04 12:40:07.238034][13647.166325401], took 0.11min
2025-12-04T12:40:07.2648462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-a0634e57d5b5356a.xml
2025-12-04T12:40:07.3420559Z Running inductor/test_helion_kernels 1/1 ... [2025-12-04 12:40:07.341821][13647.270119503]
2025-12-04T12:40:07.3421033Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:40:07.3424177Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_helion_kernels.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:07.342137]
2025-12-04T12:40:12.7156760Z 
2025-12-04T12:40:12.7157977Z inductor/test_helion_kernels 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_helion_kernels_1.1_570e1b539a64331c_.log
2025-12-04T12:40:12.7159280Z Running 2 items in this shard: test/inductor/test_helion_kernels.py::HelionTests::test_add_kernel, test/inductor/test_helion_kernels.py::HelionTests::test_softmax_view_reshape
2025-12-04T12:40:12.7159933Z 
2025-12-04T12:40:12.7160330Z Finished inductor/test_helion_kernels 1/1 ... [2025-12-04 12:40:12.715429][13652.64372413], took 0.09min
2025-12-04T12:40:12.7429698Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-31b2988c053252a6.xml
2025-12-04T12:40:12.7719248Z Running inductor/test_quantization 1/1 ... [2025-12-04 12:40:12.771679][13652.699978231]
2025-12-04T12:40:12.7719732Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:40:12.7723119Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_quantization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:12.772006]
2025-12-04T12:40:24.7069868Z 
2025-12-04T12:40:24.7070935Z inductor/test_quantization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_quantization_1.1_121915a11d71492f_.log
2025-12-04T12:40:24.7072475Z Running 2 items in this shard: test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_with_scaling, test/inductor/test_quantization.py::TestQuantization::test_activation_quantization_aten_without_scaling
2025-12-04T12:40:24.7073410Z 
2025-12-04T12:40:24.7073702Z Finished inductor/test_quantization 1/1 ... [2025-12-04 12:40:24.706710][13664.635006436], took 0.20min
2025-12-04T12:40:24.7335903Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-84688fa1768a881f.xml
2025-12-04T12:40:24.8085738Z Running inductor/test_best_config 1/1 ... [2025-12-04 12:40:24.808331][13664.736629886]
2025-12-04T12:40:24.8086237Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:40:24.8089108Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_best_config.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:24.808647]
2025-12-04T12:40:32.0853792Z 
2025-12-04T12:40:32.0854884Z inductor/test_best_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_best_config_1.1_4e63c9241f189413_.log
2025-12-04T12:40:32.0855974Z Running 1 items in this shard: test/inductor/test_best_config.py::TestKernelBestConfig::test_best_config_has_triton_cache_key
2025-12-04T12:40:32.0856462Z 
2025-12-04T12:40:32.0856753Z Finished inductor/test_best_config 1/1 ... [2025-12-04 12:40:32.084945][13672.013234901], took 0.12min
2025-12-04T12:40:32.1123736Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_best_config/inductor.test_best_config-423c961bf0014ead.xml
2025-12-04T12:40:32.1853969Z Running export/test_tools 1/1 ... [2025-12-04 12:40:32.185159][13672.113457681]
2025-12-04T12:40:32.1854403Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:40:32.1857497Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_tools.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:32.185466]
2025-12-04T12:40:36.0061262Z 
2025-12-04T12:40:36.0062088Z export/test_tools 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_tools_1.1_3c90f78b66d684d8_.log
2025-12-04T12:40:36.0063288Z Running 2 items in this shard: test/export/test_tools.py::TestExportTools::test_report_exportability_basic, test/export/test_tools.py::TestExportTools::test_report_exportability_with_issues
2025-12-04T12:40:36.0064292Z 
2025-12-04T12:40:36.0064665Z Finished export/test_tools 1/1 ... [2025-12-04 12:40:36.005906][13675.934201134], took 0.06min
2025-12-04T12:40:36.0332040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_tools/export.test_tools-21258e52645f11ac.xml
2025-12-04T12:40:36.0626181Z Running inductor/test_compiled_optimizers 1/3 ... [2025-12-04 12:40:36.062393][13675.990691408]
2025-12-04T12:40:36.0626664Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:40:36.0629422Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compiled_optimizers.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:36.062683]
2025-12-04T12:47:42.7006572Z 
2025-12-04T12:47:42.7007883Z inductor/test_compiled_optimizers 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_optimizers_1.3_97f7ba3c63654c1d_.log
2025-12-04T12:47:42.7099179Z Running 248 items in this shard: test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_initial_accumulator_value_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_lr_decay_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_t0_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_closure_graph_break, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_foreach_map_adam, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_momentum_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_step_sizes_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_recompile_single, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_ASGD_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adafactor_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adagrad_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adamax_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_LBFGS_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Muon_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_NAdam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RAdam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RMSprop_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Rprop_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SparseAdam_use_closure_False_cuda_float32
2025-12-04T12:47:42.7187843Z 
2025-12-04T12:47:42.7188117Z Finished inductor/test_compiled_optimizers 1/3 ... [2025-12-04 12:47:42.700896][14102.629190424], took 7.11min
2025-12-04T12:47:42.7287132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-bb39f909a1ebabd7.xml
2025-12-04T12:47:42.8070619Z Running inductor/test_aot_inductor_custom_ops 1/1 ... [2025-12-04 12:47:42.806799][14102.735096986]
2025-12-04T12:47:42.8071455Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:47:42.8073901Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor_custom_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:47:42.807120]
2025-12-04T12:50:06.6132253Z 
2025-12-04T12:50:06.6133895Z inductor/test_aot_inductor_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_custom_ops_1.1_dddd6a5d20b7cc0a_.log
2025-12-04T12:50:06.6148461Z Running 35 items in this shard: test/inductor/test_aot_inductor_custom_ops.py::AOTInductorLoggingTest::test_shape_env_reuse, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_boxed_run_inputs_clearing_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_add_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_add_output_path_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_all_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_missing_arg_with_default_value_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_out_variant_without_return_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_return_list_of_single_tensor_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_return_single_tensor_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_square_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_concat_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_multiple_outputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_custom_op_with_reinterpret_view_inputs_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_int_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_nullopt_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_output_2_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_fn_with_optional_tensor_output_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleCpu::test_incorrect_custom_op_schema_cpu, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_boxed_run_inputs_clearing_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_add_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_add_output_path_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_all_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_missing_arg_with_default_value_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_out_variant_without_return_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_return_list_of_single_tensor_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_return_single_tensor_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_square_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_with_concat_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_with_multiple_outputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_custom_op_with_reinterpret_view_inputs_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_int_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_optional_tensor_nullopt_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_optional_tensor_output_2_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_fn_with_optional_tensor_output_cuda, test/inductor/test_aot_inductor_custom_ops.py::AOTInductorTestABICompatibleGpu::test_incorrect_custom_op_schema_cuda
2025-12-04T12:50:06.6160704Z 
2025-12-04T12:50:06.6160971Z Finished inductor/test_aot_inductor_custom_ops 1/1 ... [2025-12-04 12:50:06.613176][14246.541472903], took 2.40min
2025-12-04T12:50:06.6408044Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor_custom_ops/inductor.test_aot_inductor_custom_ops-a7a529277f9f9a31.xml
2025-12-04T12:50:06.7211840Z Running inductor/test_control_flow 4/5 ... [2025-12-04 12:50:06.720932][14246.64923176]
2025-12-04T12:50:06.7212301Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:50:06.7215090Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=4', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:50:06.721231]
2025-12-04T12:57:58.2014294Z 
2025-12-04T12:57:58.2059952Z inductor/test_control_flow 4/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_4.5_21e8d4f49de459e9_.log
2025-12-04T12:57:58.2108133Z Running 129 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_infinite_loop_error, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_models_with_mixed_device_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_mismatch_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_cpu, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_True_autograd_True
2025-12-04T12:57:58.2205692Z 
2025-12-04T12:57:58.2207435Z Finished inductor/test_control_flow 4/5 ... [2025-12-04 12:57:58.220483][14718.14877457], took 7.86min
2025-12-04T12:57:58.2477207Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-955a1f614439a238.xml
2025-12-04T12:57:59.2872948Z Uploading artifacts took 0.95 seconds
2025-12-04T12:57:59.2876953Z Running dynamo/test_cudagraphs 1/1 ... [2025-12-04 12:57:59.287483][14719.215779635]
2025-12-04T12:57:59.2877665Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:57:59.2881198Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:57:59.287857]
2025-12-04T12:58:04.4617621Z 
2025-12-04T12:58:04.4618638Z dynamo/test_cudagraphs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_1.1_90173d668b3e025f_.log
2025-12-04T12:58:04.4621531Z Running 8 items in this shard: test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_basic, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dead_fill, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dtoh, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_factory, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_htod, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_constant, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_input, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutated_metadata
2025-12-04T12:58:04.4623764Z 
2025-12-04T12:58:04.4624059Z Finished dynamo/test_cudagraphs 1/1 ... [2025-12-04 12:58:04.461496][14724.38979316], took 0.09min
2025-12-04T12:58:04.4890310Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-e1e65156309c3950.xml
2025-12-04T12:58:04.5234012Z Running inductor/test_alignment 1/1 ... [2025-12-04 12:58:04.523166][14724.451465504]
2025-12-04T12:58:04.5234639Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:58:04.5237287Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_alignment.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:04.523475]
2025-12-04T12:58:17.2109759Z 
2025-12-04T12:58:17.2111052Z inductor/test_alignment 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_alignment_1.1_a164af8fc87f74b5_.log
2025-12-04T12:58:17.2390145Z Running 12 items in this shard: test/inductor/test_alignment.py::GPUTests::test_Q4_K_dequantization_cuda, test/inductor/test_alignment.py::GPUTests::test_alignment_without_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_incorrect_meta_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1024_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1048576_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_128_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_cuda, test/inductor/test_alignment.py::GPUTests::test_view_dtype_slice_cuda
2025-12-04T12:58:17.2394741Z 
2025-12-04T12:58:17.2395139Z Finished inductor/test_alignment 1/1 ... [2025-12-04 12:58:17.210639][14737.138930421], took 0.21min
2025-12-04T12:58:17.2396414Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e4d33245d4b5500b.xml
2025-12-04T12:58:17.3236417Z Running dynamo/test_guard_serialization 1/1 ... [2025-12-04 12:58:17.323362][14737.251660909]
2025-12-04T12:58:17.3237094Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:58:17.3239852Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_serialization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:17.323698]
2025-12-04T12:58:31.3646902Z 
2025-12-04T12:58:31.3648106Z dynamo/test_guard_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_serialization_1.1_1b7b87cb0989e9eb_.log
2025-12-04T12:58:31.3664243Z Running 56 items in this shard: test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bool_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_patched_forward, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_empty, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_builtin_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_c10d_work, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_class_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_var_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_constant_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_ddp_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_default_device, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_deterministic_algorithms, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_contains, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_version, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dispatch_key_set_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dual_level, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_duplicate_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_empty_nn_module_hooks_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_equals_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_fsdp_training_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_locals, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_with_wrong_fqn, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_functorch_stack_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_global_state_guard_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode_loading, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_guard_on_key_order_with_cache, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_hasattr_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match_with_config, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_mapping_keys_check, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_nn_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_none_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_not_present_in_generic_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_range_iterator_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sdp_backend_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sequence_length, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_shape_env, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_skipped_objects, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_subclass_metadata_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tuple_iterator_len, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_type_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_sharded_tensor, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_submodule, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_process_group, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_stream, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_weakref, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_weakref_alive, test/dynamo/test_guard_serialization.py::TestGuardSerializationFSDP::test_guard_serialization_fsdp_module
2025-12-04T12:58:31.3679078Z 
2025-12-04T12:58:31.3679327Z Finished dynamo/test_guard_serialization 1/1 ... [2025-12-04 12:58:31.364396][14751.292682352], took 0.23min
2025-12-04T12:58:31.3934422Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-212782fb09fba1b6.xml
2025-12-04T12:58:31.4686186Z Running inductor/test_needs_exact_strides 1/1 ... [2025-12-04 12:58:31.468377][14751.39667498]
2025-12-04T12:58:31.4686686Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:58:31.4689567Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_needs_exact_strides.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:31.468687]
2025-12-04T12:58:40.7491328Z 
2025-12-04T12:58:40.7492951Z inductor/test_needs_exact_strides 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_needs_exact_strides_1.1_5109866c57595440_.log
2025-12-04T12:58:40.7494799Z Running 2 items in this shard: test/inductor/test_needs_exact_strides.py::TestNeedsExactStrides::test_custom_op_float32, test/inductor/test_needs_exact_strides.py::TestNeedsExactStrides::test_custom_op_float8_e8m0fnu
2025-12-04T12:58:40.7495444Z 
2025-12-04T12:58:40.7495704Z Finished inductor/test_needs_exact_strides 1/1 ... [2025-12-04 12:58:40.748774][14760.677066773], took 0.15min
2025-12-04T12:58:40.7773727Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_needs_exact_strides/inductor.test_needs_exact_strides-9df9f4b52deeb26d.xml
2025-12-04T12:58:40.8651353Z Running inductor/test_auto_functionalize 1/1 ... [2025-12-04 12:58:40.864883][14760.793181452]
2025-12-04T12:58:40.8651874Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:58:40.8654926Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_auto_functionalize.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:58:40.865212]
2025-12-04T12:59:03.9199293Z 
2025-12-04T12:59:03.9202313Z inductor/test_auto_functionalize 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_auto_functionalize_1.1_68ce77f985508c50_.log
2025-12-04T12:59:03.9225587Z Running 39 items in this shard: test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias2_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_input_to_custom_op, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_alias_id_output, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_default, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_can_with_none_return, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra1, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra3, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra4, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_extra5, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_on_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_optional_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_self_as_mutate_arg, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_tensorlist, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_old, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_auto_functionalize_with_returns_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_can_auto_functionalize, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_dynamic_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_graph_input_is_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode1_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode2_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode3_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode4_v2, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_inference_mode_view, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_recompile, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_scheduling_with_multiple_mutates, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_slice_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_split_dynamic, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_try_use_slice, test/inductor/test_auto_functionalize.py::AutoFunctionalizeTests::test_unbacked_auto_functionalize_op
2025-12-04T12:59:03.9245908Z 
2025-12-04T12:59:03.9246365Z Finished inductor/test_auto_functionalize 1/1 ... [2025-12-04 12:59:03.919598][14783.847889142], took 0.38min
2025-12-04T12:59:03.9507058Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_auto_functionalize/inductor.test_auto_functionalize-2721b330ad87dcbb.xml
2025-12-04T12:59:04.0322600Z Running dynamo/test_modes 1/1 ... [2025-12-04 12:59:04.032007][14783.960305363]
2025-12-04T12:59:04.0323093Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:59:04.0325714Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_modes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:04.032311]
2025-12-04T12:59:23.4307116Z 
2025-12-04T12:59:23.4309136Z dynamo/test_modes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_modes_1.1_8c4add47a33e36b7_.log
2025-12-04T12:59:23.4319429Z Running 29 items in this shard: test/dynamo/test_modes.py::TorchDispatchModeTests::test_skip_torch_dispatch_modes, test/dynamo/test_modes.py::TorchDispatchModeTests::test_torch_dispatch_ignore_compile_internals, test/dynamo/test_modes.py::TorchFunctionModeTests::test_builtin_equivalent_funcs, test/dynamo/test_modes.py::TorchFunctionModeTests::test_error_empty_stack_pop_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_expand, test/dynamo/test_modes.py::TorchFunctionModeTests::test_flex_attention, test/dynamo/test_modes.py::TorchFunctionModeTests::test_hop, test/dynamo/test_modes.py::TorchFunctionModeTests::test_hop_eager, test/dynamo/test_modes.py::TorchFunctionModeTests::test_intermedate_torch_function_mode_construction_mutation, test/dynamo/test_modes.py::TorchFunctionModeTests::test_is_torch_function_all_disabled, test/dynamo/test_modes.py::TorchFunctionModeTests::test_len_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_nested_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_pop_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_push_torch_function_mode, test/dynamo/test_modes.py::TorchFunctionModeTests::test_register_hook, test/dynamo/test_modes.py::TorchFunctionModeTests::test_stack_state_clear_default_device, test/dynamo/test_modes.py::TorchFunctionModeTests::test_stack_state_mutation_default_device, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_and_pop_graph_break, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_and_pop_graph_break_mutation, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_disable, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_enabled_guard, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_enter_exit, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_graph_break, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_guards_cpp, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_guards_py, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_highest_priority, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_preserves_cuda_rng_state, test/dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_restore_on_exc, test/dynamo/test_modes.py::TorchFunctionModeLifecycleTests::test_default_device_restored_after_mode_tests
2025-12-04T12:59:23.4327141Z 
2025-12-04T12:59:23.4327364Z Finished dynamo/test_modes 1/1 ... [2025-12-04 12:59:23.430255][14803.358547017], took 0.32min
2025-12-04T12:59:23.4596289Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_modes/dynamo.test_modes-84ffae5b03f7e325.xml
2025-12-04T12:59:23.5485698Z Running inductor/test_custom_partitioner_fn 1/1 ... [2025-12-04 12:59:23.548322][14803.476619996]
2025-12-04T12:59:23.5486209Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:59:23.5488899Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_custom_partitioner_fn.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:23.548628]
2025-12-04T12:59:32.6288788Z 
2025-12-04T12:59:32.6289791Z inductor/test_custom_partitioner_fn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_custom_partitioner_fn_1.1_c3b6cd6a5d7fdcc1_.log
2025-12-04T12:59:32.6290980Z Running 1 items in this shard: test/inductor/test_custom_partitioner_fn.py::TestCustomPartitionerFn::test_custom_partitioner_fn
2025-12-04T12:59:32.6291485Z 
2025-12-04T12:59:32.6291797Z Finished inductor/test_custom_partitioner_fn 1/1 ... [2025-12-04 12:59:32.628529][14812.556821916], took 0.15min
2025-12-04T12:59:32.6568247Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_custom_partitioner_fn/inductor.test_custom_partitioner_fn-7cdf0df133f39710.xml
2025-12-04T12:59:32.7419222Z Running dynamo/test_debug_utils 1/1 ... [2025-12-04 12:59:32.741656][14812.669954453]
2025-12-04T12:59:32.7419677Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:59:32.7422563Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_debug_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:32.741981]
2025-12-04T12:59:36.5632163Z 
2025-12-04T12:59:36.5633015Z dynamo/test_debug_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_debug_utils_1.1_95c2c78decfe726b_.log
2025-12-04T12:59:36.5635332Z Running 4 items in this shard: test/dynamo/test_debug_utils.py::TestDebugUtilsCUDA::test_cast_model_to_fp64_dtype_args_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsCUDA::test_generate_env_vars_string_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsDeviceCUDA::test_aot_graph_parser_cuda, test/dynamo/test_debug_utils.py::TestDebugUtilsDeviceCUDA::test_sym_aot_graph_parser_cuda
2025-12-04T12:59:36.5636734Z 
2025-12-04T12:59:36.5637014Z Finished dynamo/test_debug_utils 1/1 ... [2025-12-04 12:59:36.562963][14816.491259773], took 0.06min
2025-12-04T12:59:36.5911018Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_debug_utils/dynamo.test_debug_utils-34b85abff78e1075.xml
2025-12-04T12:59:36.6399660Z Running dynamo/test_base_hop 1/1 ... [2025-12-04 12:59:36.639728][14816.568026338]
2025-12-04T12:59:36.6400217Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:59:36.6402995Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_base_hop.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:36.640041]
2025-12-04T12:59:41.5128037Z 
2025-12-04T12:59:41.5129349Z dynamo/test_base_hop 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_base_hop_1.1_b2087f66aa3d673c_.log
2025-12-04T12:59:41.5132789Z Running 11 items in this shard: test/dynamo/test_base_hop.py::BaseHOPTest::test_aliasing_mutation_error, test/dynamo/test_base_hop.py::BaseHOPTest::test_aot_eager, test/dynamo/test_base_hop.py::BaseHOPTest::test_auto_functionalize, test/dynamo/test_base_hop.py::BaseHOPTest::test_dynamo, test/dynamo/test_base_hop.py::BaseHOPTest::test_eager_call, test/dynamo/test_base_hop.py::BaseHOPTest::test_int_input, test/dynamo/test_base_hop.py::BaseHOPTest::test_none_input, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_pytree_in_out, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_pytree_in_out_with_mutation, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_single_return, test/dynamo/test_base_hop.py::BaseHOPTest::test_schema_gen_single_return_with_mutation
2025-12-04T12:59:41.5134912Z 
2025-12-04T12:59:41.5135118Z Finished dynamo/test_base_hop 1/1 ... [2025-12-04 12:59:41.512373][14821.440668173], took 0.08min
2025-12-04T12:59:41.5408423Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_base_hop/dynamo.test_base_hop-f16fc4276458e68a.xml
2025-12-04T12:59:41.5726309Z Running dynamo/test_export 1/1 ... [2025-12-04 12:59:41.572385][14821.500685041]
2025-12-04T12:59:41.5726752Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:59:41.5729394Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_export.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:41.572684]
2025-12-04T13:00:07.0287594Z 
2025-12-04T13:00:07.0288880Z dynamo/test_export 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_export_1.1_b7c3be727fa89598_.log
2025-12-04T13:00:07.0332519Z Running 186 items in this shard: test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_attr, test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_builtin, test/dynamo/test_export.py::ExportTests::test_byte_tensor_does_not_crash, test/dynamo/test_export.py::ExportTests::test_capture_symbolic_tracing_simple_within_fake_mode, test/dynamo/test_export.py::ExportTests::test_capture_symbolic_tracing_within_fake_mode, test/dynamo/test_export.py::ExportTests::test_cond_free_variables_overlapping, test/dynamo/test_export.py::ExportTests::test_cond_op_param_buffer_lifted, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_branch_args_mismatch, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_branch_return_multiple_tensors, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_branch_return_non_tensor, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_mismatch_return_length, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_mismatch_return_tensor_meta, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_missing_args, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_non_list_operands, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_non_tensor_operands, test/dynamo/test_export.py::ExportTests::test_cond_raise_user_error_on_unsupported_pred, test/dynamo/test_export.py::ExportTests::test_cond_supported_pred_types, test/dynamo/test_export.py::ExportTests::test_constraint_violation_error_messages, test/dynamo/test_export.py::ExportTests::test_dataclass_input_output, test/dynamo/test_export.py::ExportTests::test_dict_return, test/dynamo/test_export.py::ExportTests::test_dict_return_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes, test/dynamo/test_export.py::ExportTests::test_dupes_2, test/dynamo/test_export.py::ExportTests::test_dupes_2_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_reorder_with_non_tensor_arg_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_arg, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_arg_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_output, test/dynamo/test_export.py::ExportTests::test_dupes_and_bypass_with_non_tensor_output_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dupes_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_dynamic_slicing, test/dynamo/test_export.py::ExportTests::test_dynamic_slicing_simple, test/dynamo/test_export.py::ExportTests::test_dynamo_enum_in_tuple, test/dynamo/test_export.py::ExportTests::test_dynamo_list_index, test/dynamo/test_export.py::ExportTests::test_empty, test/dynamo/test_export.py::ExportTests::test_enforce_equalities, test/dynamo/test_export.py::ExportTests::test_export, test/dynamo/test_export.py::ExportTests::test_export_compare_optimize_with_make_fx, test/dynamo/test_export.py::ExportTests::test_export_cond_in_aten_symbolic, test/dynamo/test_export.py::ExportTests::test_export_control_flow_with_getattr, test/dynamo/test_export.py::ExportTests::test_export_decomp, test/dynamo/test_export.py::ExportTests::test_export_decomp_asserts_bad_args, test/dynamo/test_export.py::ExportTests::test_export_defaults_ok, test/dynamo/test_export.py::ExportTests::test_export_dynamic_control_flow_error, test/dynamo/test_export.py::ExportTests::test_export_dynamic_dim_cleanup, test/dynamo/test_export.py::ExportTests::test_export_dynamic_dim_not_1, test/dynamo/test_export.py::ExportTests::test_export_dynamic_dim_range_constraint, test/dynamo/test_export.py::ExportTests::test_export_graph_bypass, test/dynamo/test_export.py::ExportTests::test_export_graph_bypass_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_graph_with_complex_reorder, test/dynamo/test_export.py::ExportTests::test_export_graph_with_complex_reorder_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_graph_with_list, test/dynamo/test_export.py::ExportTests::test_export_graph_with_list_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_identity, test/dynamo/test_export.py::ExportTests::test_export_masking_with_no_grad, test/dynamo/test_export.py::ExportTests::test_export_meta, test/dynamo/test_export.py::ExportTests::test_export_meta_val, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out_2, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out_2_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_mismatched_out_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_module_specify_constraints_signature, test/dynamo/test_export.py::ExportTests::test_export_multi_dynamic_dim_constraint, test/dynamo/test_export.py::ExportTests::test_export_multi_dynamic_dim_unsafe_relationship, test/dynamo/test_export.py::ExportTests::test_export_nn_module_stack_patched_module, test/dynamo/test_export.py::ExportTests::test_export_no_raise, test/dynamo/test_export.py::ExportTests::test_export_no_tensor_computation_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_pass_arg_by_name, test/dynamo/test_export.py::ExportTests::test_export_pass_arg_by_name_star_args, test/dynamo/test_export.py::ExportTests::test_export_persist_assert, test/dynamo/test_export.py::ExportTests::test_export_preserve_constraints_as_metadata_tensor, test/dynamo/test_export.py::ExportTests::test_export_preserves_nn_module_stack_for_get_attr, test/dynamo/test_export.py::ExportTests::test_export_raise_guard_full_constraint, test/dynamo/test_export.py::ExportTests::test_export_raise_guard_partial_constraint, test/dynamo/test_export.py::ExportTests::test_export_raise_on_relationship, test/dynamo/test_export.py::ExportTests::test_export_shape_control_flow_1, test/dynamo/test_export.py::ExportTests::test_export_specialized_int, test/dynamo/test_export.py::ExportTests::test_export_symbolic_shape, test/dynamo/test_export.py::ExportTests::test_export_with_args_and_empty_kwargs, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_None, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_float, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_tensor, test/dynamo/test_export.py::ExportTests::test_export_with_args_with_default_tuple, test/dynamo/test_export.py::ExportTests::test_export_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_export_with_builtin_op_on_assume_constant, test/dynamo/test_export.py::ExportTests::test_export_with_cond_branches_calling_methods, test/dynamo/test_export.py::ExportTests::test_export_with_cond_closure, test/dynamo/test_export.py::ExportTests::test_export_with_cond_dynamic_shape_pred, test/dynamo/test_export.py::ExportTests::test_export_with_cond_with_closed_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_dict_values, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function_and_class_method, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function_and_class_method_multiarg, test/dynamo/test_export.py::ExportTests::test_export_with_constant_free_function_and_class_method_multiarg_diff, test/dynamo/test_export.py::ExportTests::test_export_with_constant_global_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_in_unspecialized_nn_module, test/dynamo/test_export.py::ExportTests::test_export_with_constant_list_nonzero, test/dynamo/test_export.py::ExportTests::test_export_with_constant_list_nonzero_free_function, test/dynamo/test_export.py::ExportTests::test_export_with_constant_method_on_module, test/dynamo/test_export.py::ExportTests::test_export_with_constant_method_on_module_invoke_twice, test/dynamo/test_export.py::ExportTests::test_export_with_constant_none_control_flow, test/dynamo/test_export.py::ExportTests::test_export_with_constant_none_control_flow_free_func, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_none_control_flow, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_none_control_flow_free_func, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_none_control_flow_pos, test/dynamo/test_export.py::ExportTests::test_export_with_constant_not_return_const, test/dynamo/test_export.py::ExportTests::test_export_with_constant_tuple_nonzero, test/dynamo/test_export.py::ExportTests::test_export_with_functools_wrapped_fn, test/dynamo/test_export.py::ExportTests::test_export_with_functools_wrapped_method, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_and_empty_args, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_None, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_float, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_tensor, test/dynamo/test_export.py::ExportTests::test_export_with_kwargs_with_default_tuple, test/dynamo/test_export.py::ExportTests::test_export_with_map_cond, test/dynamo/test_export.py::ExportTests::test_export_with_map_zero_sized_tensor, test/dynamo/test_export.py::ExportTests::test_export_with_map_zero_sized_tensor_suppress_errors, test/dynamo/test_export.py::ExportTests::test_export_with_module_layer, test/dynamo/test_export.py::ExportTests::test_export_with_nonzero_static, test/dynamo/test_export.py::ExportTests::test_export_with_shallow_list_copy_with_side_effects, test/dynamo/test_export.py::ExportTests::test_export_with_shallow_list_copy_wo_side_effects, test/dynamo/test_export.py::ExportTests::test_export_with_stack_trace, test/dynamo/test_export.py::ExportTests::test_export_with_symbool_inputs, test/dynamo/test_export.py::ExportTests::test_export_with_wrapped_fn, test/dynamo/test_export.py::ExportTests::test_exported_graph_serialization, test/dynamo/test_export.py::ExportTests::test_func_return, test/dynamo/test_export.py::ExportTests::test_func_return_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_fx_pytree, test/dynamo/test_export.py::ExportTests::test_immutable_list_dict, test/dynamo/test_export.py::ExportTests::test_input_container_type, test/dynamo/test_export.py::ExportTests::test_input_global, test/dynamo/test_export.py::ExportTests::test_input_global_multiple_access, test/dynamo/test_export.py::ExportTests::test_input_nonlocal, test/dynamo/test_export.py::ExportTests::test_input_unused_nonlocal_ok, test/dynamo/test_export.py::ExportTests::test_list_contains, test/dynamo/test_export.py::ExportTests::test_list_not_contains, test/dynamo/test_export.py::ExportTests::test_list_unpack, test/dynamo/test_export.py::ExportTests::test_list_unpack_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_map_cond_param_buffer_lifted, test/dynamo/test_export.py::ExportTests::test_mixed_real_and_fake_inputs, test/dynamo/test_export.py::ExportTests::test_multiple_outputs_op_with_evaluator, test/dynamo/test_export.py::ExportTests::test_nested_cond_op_param_buffer_lifted, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation_2, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation_2_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_no_tensor_computation_fail, test/dynamo/test_export.py::ExportTests::test_not_functionalize, test/dynamo/test_export.py::ExportTests::test_param_buffer_safe_from_mutation_recurse, test/dynamo/test_export.py::ExportTests::test_param_buffer_safe_from_mutation_simple, test/dynamo/test_export.py::ExportTests::test_pre_dispatch_simple, test/dynamo/test_export.py::ExportTests::test_predispatch_with_for_out_dtype, test/dynamo/test_export.py::ExportTests::test_predispatch_with_for_out_dtype_nested, test/dynamo/test_export.py::ExportTests::test_predispatch_with_higher_order, test/dynamo/test_export.py::ExportTests::test_predispatch_with_higher_order_nested, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata_graph_break, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata_inline, test/dynamo/test_export.py::ExportTests::test_preserve_fx_node_metadata_recompile, test/dynamo/test_export.py::ExportTests::test_remove_redundant_dynamic_dim_in_error_message, test/dynamo/test_export.py::ExportTests::test_retracibility, test/dynamo/test_export.py::ExportTests::test_retracibility_dict_container_inp_out, test/dynamo/test_export.py::ExportTests::test_retracibility_nested_list_out, test/dynamo/test_export.py::ExportTests::test_round_dynamic_shapes, test/dynamo/test_export.py::ExportTests::test_strict_fake_tensor_prop_real_tensors, test/dynamo/test_export.py::ExportTests::test_subclass_parameters, test/dynamo/test_export.py::ExportTests::test_sum_param, test/dynamo/test_export.py::ExportTests::test_sym_contains, test/dynamo/test_export.py::ExportTests::test_symbolic_tracing_within_fake_mode_with_constraints, test/dynamo/test_export.py::ExportTests::test_symbolic_tracing_within_fake_mode_with_constraints_with_parameters, test/dynamo/test_export.py::ExportTests::test_symbool, test/dynamo/test_export.py::ExportTests::test_torch_inference_mode_ctx, test/dynamo/test_export.py::ExportTests::test_trivial_constraint, test/dynamo/test_export.py::ExportTests::test_uncaptured_higher_order_op_error_not_suppresed, test/dynamo/test_export.py::ExportTests::test_untracked_inputs_in_constraints, test/dynamo/test_export.py::ExportTests::test_zeroes_in_and_out_different_shape_on_test, test/dynamo/test_export.py::ExportTests::test_zeroes_in_and_out_different_shape_on_test_with_aten_graph, test/dynamo/test_export.py::ExportTests::test_zeroes_in_new_shape_scalar_out, test/dynamo/test_export.py::ExportTests::test_zeroes_in_new_shape_scalar_out_permute, test/dynamo/test_export.py::ExportTests::test_zeroes_in_new_shape_scalar_out_permute_dupe_and_bypass, test/dynamo/test_export.py::ExportTestsDeviceCUDA::test_export_fast_binary_broadcast_check_cuda, test/dynamo/test_export.py::ExportTestsDeviceCUDA::test_export_fast_binary_broadcast_check_unbacked_cuda, test/dynamo/test_export.py::ExportTestsDeviceCUDA::test_export_with_parameters_cuda
2025-12-04T13:00:07.0372632Z 
2025-12-04T13:00:07.0372843Z Finished dynamo/test_export 1/1 ... [2025-12-04 13:00:07.029110][14846.957407523], took 0.42min
2025-12-04T13:00:07.0578983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_export/dynamo.test_export-b39b0ef66a188fdf.xml
2025-12-04T13:00:07.1568100Z Running dynamo/test_python_dispatcher 1/1 ... [2025-12-04 13:00:07.156570][14847.084869568]
2025-12-04T13:00:07.1568852Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:00:07.1571129Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_python_dispatcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:07.156848]
2025-12-04T13:00:11.4285251Z 
2025-12-04T13:00:11.4287485Z dynamo/test_python_dispatcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_dispatcher_1.1_786a010f0f7c3940_.log
2025-12-04T13:00:11.4290386Z Running 6 items in this shard: test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key1, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key2, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key3, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key4, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key_set_guard, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_functorch_interpreter
2025-12-04T13:00:11.4291971Z 
2025-12-04T13:00:11.4292206Z Finished dynamo/test_python_dispatcher 1/1 ... [2025-12-04 13:00:11.428114][14851.356408619], took 0.07min
2025-12-04T13:00:11.4568965Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-89e5f4289c609732.xml
2025-12-04T13:00:11.4905676Z Running export/test_swap 1/1 ... [2025-12-04 13:00:11.490332][14851.418631126]
2025-12-04T13:00:11.4906348Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:00:11.4909434Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_swap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:11.490611]
2025-12-04T13:00:17.1644093Z 
2025-12-04T13:00:17.1645179Z export/test_swap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_swap_1.1_95c916183e2c0305_.log
2025-12-04T13:00:17.1650667Z Running 20 items in this shard: test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_args, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_nonstrict::test_custom_output, test/export/test_swap.py::TestSwap_nonstrict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_nonstrict::test_nested_leaf, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_with_unused_input, test/export/test_swap.py::TestSwap_strict::test_custom_input_args, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_strict::test_custom_output, test/export/test_swap.py::TestSwap_strict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_strict::test_nested_leaf, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_with_unused_input
2025-12-04T13:00:17.1654961Z 
2025-12-04T13:00:17.1655153Z Finished export/test_swap 1/1 ... [2025-12-04 13:00:17.164154][14857.092451423], took 0.09min
2025-12-04T13:00:17.1930037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_swap/export.test_swap-4cc8060b16634bb1.xml
2025-12-04T13:00:17.2268365Z Running export/test_unflatten 1/1 ... [2025-12-04 13:00:17.226565][14857.154864024]
2025-12-04T13:00:17.2269081Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:00:17.2271878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:17.226864]
2025-12-04T13:00:29.4114830Z 
2025-12-04T13:00:29.4115978Z export/test_unflatten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_1.1_d630949213564262_.log
2025-12-04T13:00:29.4126075Z Running 29 items in this shard: test/export/test_unflatten.py::TestUnflatten::test_assert_tensor_metadata_stack, test/export/test_unflatten.py::TestUnflatten::test_attr_as_submod_input, test/export/test_unflatten.py::TestUnflatten::test_dedup_sym_size, test/export/test_unflatten.py::TestUnflatten::test_double_nested_submodule, test/export/test_unflatten.py::TestUnflatten::test_duplicate_placeholder, test/export/test_unflatten.py::TestUnflatten::test_fx_trace, test/export/test_unflatten.py::TestUnflatten::test_nested_leaf_non_strict, test/export/test_unflatten.py::TestUnflatten::test_placeholder_and_get_attr_ordering_after_unflattened, test/export/test_unflatten.py::TestUnflatten::test_simple_alias, test/export/test_unflatten.py::TestUnflatten::test_unflatten_buffer_mutation, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_obj, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_tensor, test/export/test_unflatten.py::TestUnflatten::test_unflatten_container_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_eager, test/export/test_unflatten.py::TestUnflatten::test_unflatten_empty_branch, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested_access, test/export/test_unflatten.py::TestUnflatten::test_unflatten_none, test/export/test_unflatten.py::TestUnflatten::test_unflatten_param_list_dict, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_signature, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_with_unused_input, test/export/test_unflatten.py::TestUnflatten::test_unflatten_requires_grad_param, test/export/test_unflatten.py::TestUnflatten::test_unflatten_root_module_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_shared_submodule, test/export/test_unflatten.py::TestUnflatten::test_unflatten_skipped_call_module, test/export/test_unflatten.py::TestUnflatten::test_unflatten_submodule_ordering, test/export/test_unflatten.py::TestUnflatten::test_unflatten_with_inplace_compile, test/export/test_unflatten.py::TestUnflatten::test_unflatten_wrong_input, test/export/test_unflatten.py::TestUnflatten::test_unflattened_module_nodes_has_meta_val
2025-12-04T13:00:29.4132765Z 
2025-12-04T13:00:29.4133015Z Finished export/test_unflatten 1/1 ... [2025-12-04 13:00:29.411194][14869.339486878], took 0.20min
2025-12-04T13:00:29.4407434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-9dcd937885307cef.xml
2025-12-04T13:00:29.5289520Z Running dynamo/test_verify_correctness 1/1 ... [2025-12-04 13:00:29.528694][14869.456992509]
2025-12-04T13:00:29.5290145Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:00:29.5292684Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_verify_correctness.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:29.528981]
2025-12-04T13:00:33.5008341Z 
2025-12-04T13:00:33.5009272Z dynamo/test_verify_correctness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_verify_correctness_1.1_d2d881eebc4cfc16_.log
2025-12-04T13:00:33.5011342Z Running 4 items in this shard: test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_example_inputs, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_false, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_true, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_torchscript
2025-12-04T13:00:33.5012749Z 
2025-12-04T13:00:33.5013071Z Finished dynamo/test_verify_correctness 1/1 ... [2025-12-04 13:00:33.500548][14873.428843712], took 0.07min
2025-12-04T13:00:33.5293579Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-22b40053bc190597.xml
2025-12-04T13:00:33.5551507Z Running dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 13:00:33.554914][14873.483212913]
2025-12-04T13:00:33.5552068Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:00:33.5554536Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_wrap_inductor_compiled_regions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:33.555202]
2025-12-04T13:00:50.9991367Z 
2025-12-04T13:00:50.9992658Z dynamo/test_wrap_inductor_compiled_regions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_3fbc3d993fa8b554_.log
2025-12-04T13:00:51.0001192Z Running 18 items in this shard: test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_must_save, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_sac_prefer_recompute, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_with_wrapper_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_flex_attention_wrapper_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_basic, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_sac_outer_compile_inner_flex_attention, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_config_affects_cache_key, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_default_disabled, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_disabled_not_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_enabled_visible_in_debug_mode, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_no_dispatch_mode_no_hop_invoked, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_option_type_validation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_per_compilation, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_backward, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_cache, test/dynamo/test_wrap_inductor_compiled_regions.py::TestWrapInductorCompiledRegions::test_wrap_with_multiple_ops
2025-12-04T13:00:51.0007866Z 
2025-12-04T13:00:51.0008142Z Finished dynamo/test_wrap_inductor_compiled_regions 1/1 ... [2025-12-04 13:00:50.998844][14890.927135691], took 0.29min
2025-12-04T13:00:51.0280721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-e50f738759450405.xml
2025-12-04T13:00:51.1085537Z Running dynamo/test_cudagraphs_expandable_segments 1/1 ... [2025-12-04 13:00:51.108271][14891.036570516]
2025-12-04T13:00:51.1086070Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:00:51.1088729Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs_expandable_segments.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:51.108589]
2025-12-04T13:00:56.3821463Z 
2025-12-04T13:00:56.3823543Z dynamo/test_cudagraphs_expandable_segments 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_expandable_segments_1.1_461bf64d7b370157_.log
2025-12-04T13:00:56.3828472Z Running 8 items in this shard: test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_basic, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_dead_fill, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_dtoh, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_factory, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_htod, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutate_constant, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutate_input, test/dynamo/test_cudagraphs_expandable_segments.py::TestAotCudagraphs::test_mutated_metadata
2025-12-04T13:00:56.3831258Z 
2025-12-04T13:00:56.3831547Z Finished dynamo/test_cudagraphs_expandable_segments 1/1 ... [2025-12-04 13:00:56.381747][14896.309998118], took 0.09min
2025-12-04T13:00:56.4112807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_cudagraphs_expandable_segments/dynamo.test_cudagraphs_expandable_segments-6088bb8977cfc034.xml
2025-12-04T13:00:56.4419725Z Running inductor/test_caching 1/1 ... [2025-12-04 13:00:56.441740][14896.370039306]
2025-12-04T13:00:56.4420175Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:00:56.4422916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_caching.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:00:56.442046]
2025-12-04T13:02:43.5653375Z 
2025-12-04T13:02:43.5654545Z inductor/test_caching 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_caching_1.1_ee4444a2502ab159_.log
2025-12-04T13:02:43.5718685Z Running 169 items in this shard: test/inductor/test_caching.py::ConfigTest::test_versioned_config_env_var_override_enabled_False, test/inductor/test_caching.py::ConfigTest::test_versioned_config_env_var_override_enabled_True, test/inductor/test_caching.py::ConfigTest::test_versioned_config_jk_failure, test/inductor/test_caching.py::ConfigTest::test_versioned_config_oss_default_enabled_False, test/inductor/test_caching.py::ConfigTest::test_versioned_config_oss_default_enabled_True, test/inductor/test_caching.py::ConfigTest::test_versioned_config_version_check_enabled_False, test/inductor/test_caching.py::ConfigTest::test_versioned_config_version_check_enabled_True, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_False_all_compile_context_False, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_False_all_compile_context_True, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_True_all_compile_context_False, test/inductor/test_caching.py::ContextTest::test_all_or_none_isolation_context_all_runtime_context_True_all_compile_context_True, test/inductor/test_caching.py::ContextTest::test_isolation_key_is_distinct, test/inductor/test_caching.py::ContextTest::test_isolation_key_is_repeatable, test/inductor/test_caching.py::ContextTest::test_select_compile_context_matches_forms_of_context, test/inductor/test_caching.py::ContextTest::test_select_runtime_context_matches_forms_of_context, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected0_compile_forms_of_context_selected9, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected1_compile_forms_of_context_selected9, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected2_compile_forms_of_context_selected9, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected0, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected1, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected10, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected2, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected3, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected4, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected5, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected6, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected7, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected8, test/inductor/test_caching.py::ContextTest::test_selected_isolation_context_runtime_forms_of_context_selected3_compile_forms_of_context_selected9, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CacheError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CustomParamsEncoderRequiredError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CustomResultDecoderRequiredError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_CustomResultEncoderRequiredError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_DeterministicCachingRequiresStrongConsistencyError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_FileLockTimeoutError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_KeyEncodingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_KeyPicklingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_LockTimeoutError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_SystemError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_UserError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValueDecodingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValueEncodingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValuePicklingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_is_CacheError_exception_typename_ValueUnPicklingError, test/inductor/test_caching.py::ExceptionsTest::test_exception_other, test/inductor/test_caching.py::ImplementationsTest::test_get_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_get_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_will_not_overwrite_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_insert_will_not_overwrite_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_key_encoding_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_key_encoding_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_decoding_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_decoding_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_encoding_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_value_encoding_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_version_mismatch_impl_typename__InMemoryCacheImpl, test/inductor/test_caching.py::ImplementationsTest::test_version_mismatch_impl_typename__OnDiskCacheImpl, test/inductor/test_caching.py::InterfacesTest::test_caching_module_disabled_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_caching_module_disabled_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_ischema_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_ischema_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_params_encoder_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_params_encoder_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_result_encoder_and_decoder_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_custom_result_encoder_and_decoder_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_defaults_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_defaults_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_deterministic_caching_disabled, test/inductor/test_caching.py::InterfacesTest::test_params_encoder_required_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_params_encoder_required_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_and_decoder_required_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_and_decoder_required_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_required_intf_typename__DeterministicCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_result_encoder_required_intf_typename__FastCacheIntf, test/inductor/test_caching.py::InterfacesTest::test_strictly_cached_determinism, test/inductor/test_caching.py::InterfacesTest::test_strictly_pre_populated_determinism, test/inductor/test_caching.py::LocksTest::test_BLOCKING, test/inductor/test_caching.py::LocksTest::test_BLOCKING_WITH_TIMEOUT, test/inductor/test_caching.py::LocksTest::test_NON_BLOCKING, test/inductor/test_caching.py::LocksTest::test_acquire_many_impl_locks_with_timeout_impl_typename_combos0, test/inductor/test_caching.py::LocksTest::test_acquire_many_impl_locks_with_timeout_impl_typename_combos1, test/inductor/test_caching.py::LocksTest::test_acquire_many_impl_locks_with_timeout_impl_typename_combos2, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_FileLock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_WITH_TIMEOUT_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_safe_release_unlocked, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_after_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_before_timeout, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_never, test/inductor/test_caching.py::LocksTest::test_acquire_with_timeout_lock_typename_Lock_lock_timeout_NON_BLOCKING_acquisition_mode_unsafe_release_unlocked, test/inductor/test_caching.py::UtilsTest::test_lru_cache, test/inductor/test_caching.py::UtilsTest::test_try_pickle_key_pickle_able_False, test/inductor/test_caching.py::UtilsTest::test_try_pickle_key_pickle_able_True, test/inductor/test_caching.py::UtilsTest::test_try_pickle_value_pickle_able_False, test/inductor/test_caching.py::UtilsTest::test_try_pickle_value_pickle_able_True, test/inductor/test_caching.py::UtilsTest::test_try_unpickle_value_unpickle_able_False, test/inductor/test_caching.py::UtilsTest::test_try_unpickle_value_unpickle_able_True
2025-12-04T13:02:43.5779318Z 
2025-12-04T13:02:43.5779556Z Finished inductor/test_caching 1/1 ... [2025-12-04 13:02:43.565301][15003.493596006], took 1.79min
2025-12-04T13:02:43.5960242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_caching/inductor.test_caching-81c36c30e8c9f16c.xml
2025-12-04T13:02:43.6726708Z Running dynamo/test_reorder_logs 1/1 ... [2025-12-04 13:02:43.672413][15003.600710394]
2025-12-04T13:02:43.6727186Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:02:43.6729621Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_reorder_logs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:02:43.672702]
2025-12-04T13:02:48.2954227Z 
2025-12-04T13:02:48.2955093Z dynamo/test_reorder_logs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_reorder_logs_1.1_bce7080c1339fb4f_.log
2025-12-04T13:02:48.2960964Z Running 14 items in this shard: test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method0_fn0_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method1_fn1_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method2_fn2_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method3_fn3_should_ignore_logger_False, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method4_fn4_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method5_fn5_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method6_fn6_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::IgnoreLogsTests::test_ignore_logger_ignore_method7_fn7_should_ignore_logger_True, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_constant_mutation, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_dont_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_custom_log_fn, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_print_graph_break, test/dynamo/test_reorder_logs.py::ReorderLogsTests::test_reorder_warnings
2025-12-04T13:02:48.2965197Z 
2025-12-04T13:02:48.2965414Z Finished dynamo/test_reorder_logs 1/1 ... [2025-12-04 13:02:48.295190][15008.223484427], took 0.08min
2025-12-04T13:02:48.3249999Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-8774cf5ade30b7b9.xml
2025-12-04T13:02:48.3578775Z Running dynamo/test_subclasses 1/1 ... [2025-12-04 13:02:48.357633][15008.285931799]
2025-12-04T13:02:48.3579226Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:02:48.3581933Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_subclasses.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:02:48.357931]
2025-12-04T13:03:19.9250150Z 
2025-12-04T13:03:19.9255832Z dynamo/test_subclasses 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_subclasses_1.1_0f5f75f18480ff60_.log
2025-12-04T13:03:19.9289730Z Running 126 items in this shard: test/dynamo/test_subclasses.py::SubclassTests::test_as_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_base_torch_function_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_compile_higher_order_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_automatic_dynamic, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_fake_tensor_dynamic_dim, test/dynamo/test_subclasses.py::SubclassTests::test_compile_with_functionalization, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values, test/dynamo/test_subclasses.py::SubclassTests::test_disable_all_torch_function_restore_values_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_has_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_make_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_mark_static_with_subclass_desugaring_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_newly_constructed_tensor_subclass_attr_mutation, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_buffer, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_from_cat, test/dynamo/test_subclasses.py::SubclassTests::test_njt_subclass_simple, test/dynamo/test_subclasses.py::SubclassTests::test_no_call_to_new, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_on_size_bytecode, test/dynamo/test_subclasses.py::SubclassTests::test_no_torch_function_recompiles, test/dynamo/test_subclasses.py::SubclassTests::test_nontraceable_tensor_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_overridden_method_guarding, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_custom_torch_func_and_dynamic_attr, test/dynamo/test_subclasses.py::SubclassTests::test_parameter_subclass_with_old_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_recompile_with_symbool_inputs, test/dynamo/test_subclasses.py::SubclassTests::test_recompiles_with_optional_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_return_as_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_local_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_return_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_TwoTensor_TwoTensor, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_TwoTensor_nested_diff_sizes, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_constructor_proxying, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_attr, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_dont_invoke_torch_function_on_overridden_method, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_override_shape_and_to, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_parameters_are_static_under_training, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_False, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_views_dynamic_True, test/dynamo/test_subclasses.py::SubclassTests::test_subclass_with_disabled_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_support_bases, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_automatic_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_clone_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_different_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mark_dynamic_shapes, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_nested, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_multiple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_shape, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_return_tensor_and_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_simple, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_TwoTensor_view_mul, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_attr_codegen_tos, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_arg_num, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_error_not_classmethod, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_custom_guards_override, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_ctx_recursive_guards, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_custom_attr, test/dynamo/test_subclasses.py::SubclassTests::test_tensor_subclass_with_non_classmethod_torch_function, test/dynamo/test_subclasses.py::SubclassTests::test_torch_dispatch_subclass_guard_recompile, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_attr, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_call_on_method_arg, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_list_args, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_graph_break, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_guards, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_nested, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_state_tracing, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_subclass_survives_into_aot_autograd, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class, test/dynamo/test_subclasses.py::SubclassTests::test_torch_function_wrapper_class_with_kwargs, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_equality_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_identity_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_subclass, test/dynamo/test_subclasses.py::SubclassTests::test_type_check_isinstance_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_attr_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_method_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_user_overridden_property_unsupported, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_dynamo_attribute_access_on_intermediate, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_guards_on_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_differently_sized_inner_tensor, test/dynamo/test_subclasses.py::SubclassTests::test_wrapper_subclass_with_same_sized_inner_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd, test/dynamo/test_subclasses.py::TestNestedTensor::test_basic_autograd_inductor, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_binary_recompiles, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input_6, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_4, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_intermediate_5, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_2, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_mixed_3, test/dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_is_nested_call, test/dynamo/test_subclasses.py::TestNestedTensor::test_inference_tensor, test/dynamo/test_subclasses.py::TestNestedTensor::test_inline_nested_tensor_from_jagged, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_False_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_basic, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_False_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_False, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_leaf_True_True, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_base_is_nt_True_obscure, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_dense_subclass_dense_subclass, test/dynamo/test_subclasses.py::TestNestedTensor::test_inputs_to_compiled_fn_are_views_nt_view_name_subclass_dense, test/dynamo/test_subclasses.py::TestNestedTensor::test_param_subclass_isinstance_input, test/dynamo/test_subclasses.py::TestNestedTensor::test_return_shape, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_dense_subclass_dense_view, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_gives_static_shapes_when_dynamic_false, test/dynamo/test_subclasses.py::TestNestedTensor::test_subclass_with_mutation_in_graph, test/dynamo/test_subclasses.py::TestNestedTensor::test_unary_does_not_recompile, test/dynamo/test_subclasses.py::TestNestedTensor::test_unbind
2025-12-04T13:03:19.9322043Z 
2025-12-04T13:03:19.9322276Z Finished dynamo/test_subclasses 1/1 ... [2025-12-04 13:03:19.925249][15039.853537264], took 0.53min
2025-12-04T13:03:19.9556809Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-debad3a483737c49.xml
2025-12-04T13:03:20.0410574Z Running dynamo/test_comptime 1/1 ... [2025-12-04 13:03:20.040824][15039.969122225]
2025-12-04T13:03:20.0411198Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:03:20.0414023Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_comptime.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:20.041141]
2025-12-04T13:03:29.2728986Z 
2025-12-04T13:03:29.2730216Z dynamo/test_comptime 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_comptime_1.1_ee7f00d5f391ca6c_.log
2025-12-04T13:03:29.2733562Z Running 12 items in this shard: test/dynamo/test_comptime.py::ComptimeTests::test_get_local, test/dynamo/test_comptime.py::ComptimeTests::test_get_local_closure_variable, test/dynamo/test_comptime.py::ComptimeTests::test_graph_break, test/dynamo/test_comptime.py::ComptimeTests::test_print_bt, test/dynamo/test_comptime.py::ComptimeTests::test_print_direct, test/dynamo/test_comptime.py::ComptimeTests::test_print_disas, test/dynamo/test_comptime.py::ComptimeTests::test_print_graph, test/dynamo/test_comptime.py::ComptimeTests::test_print_guards, test/dynamo/test_comptime.py::ComptimeTests::test_print_locals, test/dynamo/test_comptime.py::ComptimeTests::test_print_single, test/dynamo/test_comptime.py::ComptimeTests::test_print_value_stack, test/dynamo/test_comptime.py::ComptimeTests::test_sleep
2025-12-04T13:03:29.2735831Z 
2025-12-04T13:03:29.2736052Z Finished dynamo/test_comptime 1/1 ... [2025-12-04 13:03:29.272553][15049.200842411], took 0.15min
2025-12-04T13:03:29.3024738Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_comptime/dynamo.test_comptime-47f6d1d4947f2a1a.xml
2025-12-04T13:03:29.3824680Z Running test_privateuseone_python_backend 1/1 ... [2025-12-04 13:03:29.382170][15049.310467862]
2025-12-04T13:03:29.3825479Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:03:29.3827753Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_privateuseone_python_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:29.382467]
2025-12-04T13:03:32.5527714Z 
2025-12-04T13:03:32.5528884Z test_privateuseone_python_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_privateuseone_python_backend_1.1_d1fcfb50f1d5d34a_.log
2025-12-04T13:03:32.5530327Z Running 2 items in this shard: test/test_privateuseone_python_backend.py::PrivateUse1BackendTest::test_accessing_is_pinned, test/test_privateuseone_python_backend.py::PrivateUse1BackendTest::test_backend_simple
2025-12-04T13:03:32.5531124Z 
2025-12-04T13:03:32.5531443Z Finished test_privateuseone_python_backend 1/1 ... [2025-12-04 13:03:32.552378][15052.480667337], took 0.05min
2025-12-04T13:03:32.5829144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_privateuseone_python_backend/test_privateuseone_python_backend-af92c89cf1e734fe.xml
2025-12-04T13:03:32.6162381Z Running functorch/test_rearrange 1/1 ... [2025-12-04 13:03:32.615972][15052.544270333]
2025-12-04T13:03:32.6163159Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:03:32.6165967Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_rearrange.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:32.616280]
2025-12-04T13:03:35.8864991Z 
2025-12-04T13:03:35.8866069Z functorch/test_rearrange 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_rearrange_1.1_f58f64fb207f9ea4_.log
2025-12-04T13:03:35.8869563Z Running 10 items in this shard: test/functorch/test_rearrange.py::TestRearrange::test_0_dim_tensor, test/functorch/test_rearrange.py::TestRearrange::test_collapsed_ellipsis_errors_out, test/functorch/test_rearrange.py::TestRearrange::test_concatenations_and_stacking, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_no_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_dimension_mismatch_with_ellipsis, test/functorch/test_rearrange.py::TestRearrange::test_ellipsis_ops, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_consistency, test/functorch/test_rearrange.py::TestRearrange::test_rearrange_permutations, test/functorch/test_rearrange.py::TestRearrange::test_squeeze, test/functorch/test_rearrange.py::TestRearrange::test_unsqueeze
2025-12-04T13:03:35.8872432Z 
2025-12-04T13:03:35.8872708Z Finished functorch/test_rearrange 1/1 ... [2025-12-04 13:03:35.886225][15055.814516856], took 0.05min
2025-12-04T13:03:35.9162739Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-2881d42f49f0d5f4.xml
2025-12-04T13:03:35.9633621Z Running functorch/test_parsing 1/1 ... [2025-12-04 13:03:35.963115][15055.891414039]
2025-12-04T13:03:35.9634389Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:03:35.9637295Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_parsing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:35.963419]
2025-12-04T13:03:39.1839226Z 
2025-12-04T13:03:39.1841015Z functorch/test_parsing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_parsing_1.1_ab4f4097a5615c37_.log
2025-12-04T13:03:39.1845980Z Running 12 items in this shard: test/functorch/test_parsing.py::TestAnonymousAxis::test_anonymous_axes, test/functorch/test_parsing.py::TestParsedExpression::test_elementary_axis_name, test/functorch/test_parsing.py::TestParsedExpression::test_invalid_expressions, test/functorch/test_parsing.py::TestParsedExpression::test_parse_expression, test/functorch/test_parsing.py::TestParsingUtils::test_ellipsis_invalid_identifier, test/functorch/test_parsing.py::TestParsingUtils::test_ellipsis_matching, test/functorch/test_parsing.py::TestParsingUtils::test_left_parenthesized_ellipsis, test/functorch/test_parsing.py::TestParsingUtils::test_parse_pattern_number_of_arrows, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_identifier_mismatch, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_non_unitary_anonymous_axes_raises_error, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_unexpected_axes_lengths, test/functorch/test_parsing.py::TestValidateRearrangeExpressions::test_validate_axes_lengths_are_integers
2025-12-04T13:03:39.1849520Z 
2025-12-04T13:03:39.1849762Z Finished functorch/test_parsing 1/1 ... [2025-12-04 13:03:39.183430][15059.111719469], took 0.05min
2025-12-04T13:03:39.2147309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_parsing/functorch.test_parsing-b19da695e97869b8.xml
2025-12-04T13:03:39.2441926Z Running test_varlen_attention 1/1 ... [2025-12-04 13:03:39.243955][15059.172253813]
2025-12-04T13:03:39.2442643Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:03:39.2445650Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_varlen_attention.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:39.244252]
2025-12-04T13:03:45.1190593Z 
2025-12-04T13:03:45.1191647Z test_varlen_attention 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_varlen_attention_1.1_7b4553019fa7c3b1_.log
2025-12-04T13:03:45.1199722Z Running 22 items in this shard: test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_basic_functionality_bfloat16_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_basic_functionality_float16_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_False_num_perms_1_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_False_num_perms_3_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_False_num_perms_5_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_True_num_perms_1_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_True_num_perms_3_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_bfloat16_is_causal_True_num_perms_5_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_False_num_perms_1_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_False_num_perms_3_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_False_num_perms_5_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_True_num_perms_1_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_True_num_perms_3_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_batch_invariance_float16_is_causal_True_num_perms_5_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_compliance_bfloat16_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_compliance_float16_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_registration_bfloat16_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_custom_op_registration_float16_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_bfloat16_is_causal_False_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_bfloat16_is_causal_True_cuda_bfloat16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_float16_is_causal_False_cuda_float16, test/test_varlen_attention.py::TestVarlenAttentionCUDA::test_varlen_vs_sdpa_float16_is_causal_True_cuda_float16
2025-12-04T13:03:45.1207416Z 
2025-12-04T13:03:45.1207630Z Finished test_varlen_attention 1/1 ... [2025-12-04 13:03:45.118676][15065.046967557], took 0.10min
2025-12-04T13:03:45.1500679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_varlen_attention/test_varlen_attention-c77e9d85fce2d7de.xml
2025-12-04T13:03:45.1823625Z Running test_mkl_verbose 1/1 ... [2025-12-04 13:03:45.182114][15065.110413069]
2025-12-04T13:03:45.1824042Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:03:45.1826894Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mkl_verbose.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:45.182421]
2025-12-04T13:03:51.8089096Z 
2025-12-04T13:03:51.8090029Z test_mkl_verbose 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkl_verbose_1.1_8c6cbee907023829_.log
2025-12-04T13:03:51.8091076Z Running 2 items in this shard: test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_off, test/test_mkl_verbose.py::TestMKLVerbose::test_verbose_on
2025-12-04T13:03:51.8091637Z 
2025-12-04T13:03:51.8091869Z Finished test_mkl_verbose 1/1 ... [2025-12-04 13:03:51.808552][15071.736843761], took 0.11min
2025-12-04T13:03:51.8394672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-aebaabaedf0418a4.xml
2025-12-04T13:03:51.9189274Z Running test_cpp_api_parity 1/1 ... [2025-12-04 13:03:51.918674][15071.846973157]
2025-12-04T13:03:51.9189939Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:03:51.9192595Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_api_parity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:03:51.919001]
2025-12-04T13:04:08.2115668Z 
2025-12-04T13:04:08.2116504Z test_cpp_api_parity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_api_parity_1.1_c465bf1c09b2866a_.log
2025-12-04T13:04:08.2244997Z Running 488 items in this shard: test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special_cuda
2025-12-04T13:04:08.2369084Z 
2025-12-04T13:04:08.2369418Z Finished test_cpp_api_parity 1/1 ... [2025-12-04 13:04:08.212067][15088.140362196], took 0.27min
2025-12-04T13:04:08.2448992Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-b67630ba146aa7a3.xml
2025-12-04T13:04:08.3302730Z Running test_autoload 1/1 ... [2025-12-04 13:04:08.330017][15088.258315537]
2025-12-04T13:04:08.3303150Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:04:08.3305840Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autoload.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:08.330314]
2025-12-04T13:04:11.5005818Z 
2025-12-04T13:04:11.5006580Z test_autoload 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autoload_1.1_7ada199d937afcb8_.log
2025-12-04T13:04:11.5007479Z Running 1 items in this shard: test/test_autoload.py::TestDeviceBackendAutoload::test_autoload
2025-12-04T13:04:11.5007864Z 
2025-12-04T13:04:11.5008112Z Finished test_autoload 1/1 ... [2025-12-04 13:04:11.500258][15091.428550028], took 0.05min
2025-12-04T13:04:11.5324687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autoload/test_autoload-f8ddaf02f0fba12a.xml
2025-12-04T13:04:11.5592470Z Running nn/attention/test_open_registry 1/1 ... [2025-12-04 13:04:11.559008][15091.487307781]
2025-12-04T13:04:11.5592965Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:04:11.5595803Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/attention/test_open_registry.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:11.559305]
2025-12-04T13:04:14.7796063Z 
2025-12-04T13:04:14.7797293Z nn/attention/test_open_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.attention.test_open_registry_1.1_7d49315786a6f063_.log
2025-12-04T13:04:14.7798822Z Running 2 items in this shard: test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_activate_unknown_impl_errors, test/nn/attention/test_open_registry.py::TestFlashAttentionRegistry::test_register_and_activate_impl
2025-12-04T13:04:14.7799839Z 
2025-12-04T13:04:14.7800293Z Finished nn/attention/test_open_registry 1/1 ... [2025-12-04 13:04:14.779312][15094.707602895], took 0.05min
2025-12-04T13:04:14.8114325Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-090bcad7e8d69cda.xml
2025-12-04T13:04:14.8400658Z Running xpu/test_fusion 1/1 ... [2025-12-04 13:04:14.839824][15094.768123868]
2025-12-04T13:04:14.8401123Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:04:14.8403903Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'xpu/test_fusion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:14.840123]
2025-12-04T13:04:17.9405522Z 
2025-12-04T13:04:17.9406317Z xpu/test_fusion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_fusion_1.1_008b7f4febfa87c6_.log
2025-12-04T13:04:17.9406994Z Running 0 items in this shard:
2025-12-04T13:04:17.9407176Z 
2025-12-04T13:04:17.9407680Z Finished xpu/test_fusion 1/1 ... [2025-12-04 13:04:17.940337][15097.868634219], took 0.05min
2025-12-04T13:04:17.9727884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/xpu.test_fusion/xpu.test_fusion-5eb21ed31dcf28c7.xml
2025-12-04T13:04:17.9972827Z Running test_foreach 1/1 ... [2025-12-04 13:04:17.997079][15097.925378126]
2025-12-04T13:04:17.9973429Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:04:17.9976219Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_foreach.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:04:17.997365]
2025-12-04T13:14:09.9922552Z 
2025-12-04T13:14:09.9925418Z test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_foreach_1.1_76ff14b11afb6f5e_.log
2025-12-04T13:14:10.0914687Z Running 3577 items in this shard: test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_cpu_ok_cuda, test/test_foreach.py::TestForeachCUDA::test_0dim_tensor_overload_exception_cuda, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_add_scalar_with_empty_list_and_empty_tensor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_norm_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_all_zero_size_tensors_do_not_launch_kernel__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_abs_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_acos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_add_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcdiv_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_addcmul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_asin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_atan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_ceil_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_clamp_min_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_copy_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cos_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_cosh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_div_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erf_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_erfc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_exp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_expm1_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_floor_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_frac_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lerp_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_lgamma_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log10_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log1p_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log2_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_log_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_max_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_maximum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_minimum_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_mul_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_neg_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_norm_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_pow_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_reciprocal_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_round_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_rsqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sigmoid_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sign_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sin_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sinh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sqrt_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_sub_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tan_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_tanh_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_trunc_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_autodiff__foreach_zero_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_max_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_False_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_big_num_tensors__foreach_norm_use_cuda_graph_True_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_float_inf_nan__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_different_tensor_dtypes__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_scalar_with_overlapping_tensors__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_add_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_max_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_clamp_min_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_div_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_maximum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_minimum_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_mul_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_pow_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_tensors_on_different_devices__foreach_sub_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_div_reciprocal_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_check_stride_ignore_dims_of_one_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_different_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_device_inputs__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_copy_with_multi_dtypes_large_input_cuda, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_l2_large_value_input__foreach_norm_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_max_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_foreach_reduce_large_input__foreach_norm_w_empty_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_copy_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_inplace_foreach_leaf_check_and_grad_fn__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_lifetime_of_grad_fn_when_result_is_saved__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_add_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_max_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_clamp_min_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_div_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lerp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_maximum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_minimum_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_mul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_pow_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_sub_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_outplace_with_invalid_grads__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_abs_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_acos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_add_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcdiv_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_addcmul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_asin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_atan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_ceil_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_clamp_min_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_copy_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cos_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_cosh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_div_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_erfc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_exp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_expm1_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_floor_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_frac_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lerp_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_lgamma_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log10_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log1p_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log2_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_log_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_max_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_maximum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_minimum_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_mul_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_neg_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_norm_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_pow_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_reciprocal_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_round_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_rsqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sigmoid_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sign_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sin_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sinh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sqrt_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_sub_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tan_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_tanh_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_trunc_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_fastpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_inplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_parity__foreach_zero_slowpath_outplace_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcdiv_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_tensors_on_different_devices__foreach_addcmul_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcdiv_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_False_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_pointwise_op_with_tensor_of_scalarlist_overload__foreach_addcmul_is_fastpath_True_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_tensors_grouping_cuda, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_abs_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_acos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_asin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_atan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_ceil_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cos_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_cosh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erf_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_erfc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_exp_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_expm1_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_floor_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_frac_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_lgamma_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log10_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log1p_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log2_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_log_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_neg_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_reciprocal_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_round_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_rsqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sigmoid_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sign_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sin_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sinh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_sqrt_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tan_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_tanh_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_trunc_cuda_uint8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bfloat16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_bool, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex128, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_complex64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_float64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int16, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int32, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int64, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_int8, test/test_foreach.py::TestForeachCUDA::test_unary_op_tensors_on_different_devices__foreach_zero_cuda_uint8
﻿2025-12-04T13:14:10.1875181Z 
2025-12-04T13:14:10.1875553Z Finished test_foreach 1/1 ... [2025-12-04 13:14:09.996568][15689.924858912], took 9.87min
2025-12-04T13:14:10.1876231Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_foreach/test_foreach-08a78fa61936b219.xml
2025-12-04T13:14:10.1876850Z Running test_pytree 1/1 ... [2025-12-04 13:14:10.159554][15690.087851271]
2025-12-04T13:14:10.1877235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:14:10.1878017Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_pytree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:10.159874]
2025-12-04T13:14:14.9826764Z 
2025-12-04T13:14:14.9828324Z test_pytree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_pytree_1.1_d855357a70a16b8f_.log
2025-12-04T13:14:14.9850165Z Running 100 items in this shard: test/test_pytree.py::TestGenericPytree::test_aligned_public_apis, test/test_pytree.py::TestGenericPytree::test_broadcast_to_and_flatten_cxx, test/test_pytree.py::TestGenericPytree::test_broadcast_to_and_flatten_python, test/test_pytree.py::TestGenericPytree::test_enum_treespec_roundtrip_cxx, test/test_pytree.py::TestGenericPytree::test_enum_treespec_roundtrip_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_defaultdict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_defaultdict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_deque_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_deque_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_dict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_dict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_leaf_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_leaf_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_list_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_list_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_namedtuple_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_namedtuple_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_nested_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_nested_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_ordereddict_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_ordereddict_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_max_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_max_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_min_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_return_types_min_python, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_tuple_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_unflatten_tuple_python, test/test_pytree.py::TestGenericPytree::test_flatten_with_is_leaf_cxx, test/test_pytree.py::TestGenericPytree::test_flatten_with_is_leaf_python, test/test_pytree.py::TestGenericPytree::test_is_namedtuple_cxx, test/test_pytree.py::TestGenericPytree::test_is_namedtuple_python, test/test_pytree.py::TestGenericPytree::test_is_structseq_cxx, test/test_pytree.py::TestGenericPytree::test_is_structseq_python, test/test_pytree.py::TestGenericPytree::test_pytree_serialize_bad_input_cxx, test/test_pytree.py::TestGenericPytree::test_pytree_serialize_bad_input_python, test/test_pytree.py::TestGenericPytree::test_register_pytree_node_cxx, test/test_pytree.py::TestGenericPytree::test_register_pytree_node_python, test/test_pytree.py::TestGenericPytree::test_tree_all_any_cxx, test/test_pytree.py::TestGenericPytree::test_tree_all_any_python, test/test_pytree.py::TestGenericPytree::test_tree_map_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_dict_order_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_dict_order_python, test/test_pytree.py::TestGenericPytree::test_tree_map_multi_inputs_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_multi_inputs_python, test/test_pytree.py::TestGenericPytree::test_tree_map_only_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_only_predicate_fn_cxx, test/test_pytree.py::TestGenericPytree::test_tree_map_only_predicate_fn_python, test/test_pytree.py::TestGenericPytree::test_tree_map_only_python, test/test_pytree.py::TestGenericPytree::test_tree_map_python, test/test_pytree.py::TestPythonPytree::test_constant, test/test_pytree.py::TestPythonPytree::test_constant_default_eq_error, test/test_pytree.py::TestPythonPytree::test_constant_default_hash_error, test/test_pytree.py::TestPythonPytree::test_dataclass, test/test_pytree.py::TestPythonPytree::test_deprecated_register_pytree_node, test/test_pytree.py::TestPythonPytree::test_flatten_flatten_with_key_consistency, test/test_pytree.py::TestPythonPytree::test_import_pytree_doesnt_import_optree, test/test_pytree.py::TestPythonPytree::test_key_access, test/test_pytree.py::TestPythonPytree::test_key_str, test/test_pytree.py::TestPythonPytree::test_pytree_context_serialize_bad, test/test_pytree.py::TestPythonPytree::test_pytree_custom_type_serialize, test/test_pytree.py::TestPythonPytree::test_pytree_custom_type_serialize_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_bad_protocol, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_defaultdict_enum, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_enum, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_namedtuple, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_namedtuple_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_register_bad, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec0, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec1, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec2, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec3, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec4, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec5, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec6, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec7, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec8, test/test_pytree.py::TestPythonPytree::test_pytree_serialize_spec9, test/test_pytree.py::TestPythonPytree::test_register_dataclass_class, test/test_pytree.py::TestPythonPytree::test_saved_serialized, test/test_pytree.py::TestPythonPytree::test_tree_flatten_with_path_is_leaf, test/test_pytree.py::TestPythonPytree::test_tree_flatten_with_path_roundtrip, test/test_pytree.py::TestPythonPytree::test_tree_leaves_with_path, test/test_pytree.py::TestPythonPytree::test_tree_map_with_path, test/test_pytree.py::TestPythonPytree::test_tree_map_with_path_multiple_trees, test/test_pytree.py::TestPythonPytree::test_treespec_equality, test/test_pytree.py::TestPythonPytree::test_treespec_repr, test/test_pytree.py::TestCxxPytree::test_pytree_custom_type_serialize, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_namedtuple, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec0, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec1, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec2, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec3, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec4, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec5, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec6, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec7, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec8, test/test_pytree.py::TestCxxPytree::test_pytree_serialize_spec9, test/test_pytree.py::TestCxxPytree::test_treespec_equality, test/test_pytree.py::TestCxxPytree::test_treespec_repr
2025-12-04T13:14:14.9869570Z 
2025-12-04T13:14:14.9869749Z Finished test_pytree 1/1 ... [2025-12-04 13:14:14.982529][15694.910819056], took 0.08min
2025-12-04T13:14:15.0148129Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_pytree/test_pytree-863a0f55639901d8.xml
2025-12-04T13:14:15.0507229Z Running test_namedtuple_return_api 1/1 ... [2025-12-04 13:14:15.050473][15694.978770887]
2025-12-04T13:14:15.0507839Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:14:15.0510926Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_namedtuple_return_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:15.050810]
2025-12-04T13:14:19.4230176Z 
2025-12-04T13:14:19.4231140Z test_namedtuple_return_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_namedtuple_return_api_1.1_7b0afa1d6fad9127_.log
2025-12-04T13:14:19.4232757Z Running 3 items in this shard: test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_import_return_types, test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_namedtuple_return, test/test_namedtuple_return_api.py::TestNamedTupleAPI::test_native_functions_yaml
2025-12-04T13:14:19.4233847Z 
2025-12-04T13:14:19.4234151Z Finished test_namedtuple_return_api 1/1 ... [2025-12-04 13:14:19.422701][15699.350990319], took 0.07min
2025-12-04T13:14:19.4553955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_namedtuple_return_api/test_namedtuple_return_api-8cf83ed9877ffdac.xml
2025-12-04T13:14:19.4853017Z Running profiler/test_record_function 1/1 ... [2025-12-04 13:14:19.485054][15699.413352889]
2025-12-04T13:14:19.4853485Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:14:19.4856214Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_record_function.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:19.485362]
2025-12-04T13:14:22.7054564Z 
2025-12-04T13:14:22.7055498Z profiler/test_record_function 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_record_function_1.1_27a3a5f054eb4856_.log
2025-12-04T13:14:22.7058306Z Running 6 items in this shard: test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_delegation_with_profiler, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function_fork, test/profiler/test_record_function.py::TestRecordFunction::test_python_dispatch_mode_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_python_subclass_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_record_function
2025-12-04T13:14:22.7060291Z 
2025-12-04T13:14:22.7060529Z Finished profiler/test_record_function 1/1 ... [2025-12-04 13:14:22.705222][15702.633517556], took 0.05min
2025-12-04T13:14:22.7376319Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_record_function/profiler.test_record_function-709377f3af2db71e.xml
2025-12-04T13:14:22.7671487Z Running test_compile_benchmark_util 1/1 ... [2025-12-04 13:14:22.766897][15702.695195294]
2025-12-04T13:14:22.7671969Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:14:22.7674848Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_compile_benchmark_util.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:22.767212]
2025-12-04T13:14:30.4528161Z 
2025-12-04T13:14:30.4529195Z test_compile_benchmark_util 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_compile_benchmark_util_1.1_37f34d6b957df075_.log
2025-12-04T13:14:30.4530667Z Running 1 items in this shard: test/test_compile_benchmark_util.py::TestCompileBenchmarkUtil::test_training_and_inference
2025-12-04T13:14:30.4531143Z 
2025-12-04T13:14:30.4531424Z Finished test_compile_benchmark_util 1/1 ... [2025-12-04 13:14:30.452455][15710.380745301], took 0.13min
2025-12-04T13:14:30.4859731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_compile_benchmark_util/test_compile_benchmark_util-827ec108afd6e51f.xml
2025-12-04T13:14:30.5698589Z Running test_set_default_mobile_cpu_allocator 1/1 ... [2025-12-04 13:14:30.569585][15710.497883589]
2025-12-04T13:14:30.5699137Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:14:30.5702062Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_set_default_mobile_cpu_allocator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:30.569947]
2025-12-04T13:14:33.7403995Z 
2025-12-04T13:14:33.7404983Z test_set_default_mobile_cpu_allocator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_set_default_mobile_cpu_allocator_1.1_d39ef648b820daa5_.log
2025-12-04T13:14:33.7406516Z Running 2 items in this shard: test/test_set_default_mobile_cpu_allocator.py::TestSetDefaultMobileCPUAllocator::test_exception, test/test_set_default_mobile_cpu_allocator.py::TestSetDefaultMobileCPUAllocator::test_no_exception
2025-12-04T13:14:33.7407456Z 
2025-12-04T13:14:33.7407783Z Finished test_set_default_mobile_cpu_allocator 1/1 ... [2025-12-04 13:14:33.740075][15713.66836338], took 0.05min
2025-12-04T13:14:33.7728034Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_set_default_mobile_cpu_allocator/test_set_default_mobile_cpu_allocator-a62b10a07a12c95d.xml
2025-12-04T13:14:33.8048744Z Running test_fake_tensor 1/1 ... [2025-12-04 13:14:33.804637][15713.732936068]
2025-12-04T13:14:33.8049190Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:14:33.8051921Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fake_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:33.804939]
2025-12-04T13:14:56.7583172Z 
2025-12-04T13:14:56.7584025Z test_fake_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fake_tensor_1.1_e61dd7d666716729_.log
2025-12-04T13:14:56.7660468Z Running 288 items in this shard: test/test_fake_tensor.py::FakeTensorTest::test__adaptive_avg_pool2d_backward, test/test_fake_tensor.py::FakeTensorTest::test_alias_call, test/test_fake_tensor.py::FakeTensorTest::test_allow_meta, test/test_fake_tensor.py::FakeTensorTest::test_aten_copy_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_aten_index_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_aten_slice_scatter_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_basic, test/test_fake_tensor.py::FakeTensorTest::test_batch_tensor, test/test_fake_tensor.py::FakeTensorTest::test_binary_op_type_promotion, test/test_fake_tensor.py::FakeTensorTest::test_constructor, test/test_fake_tensor.py::FakeTensorTest::test_conv_nhwc, test/test_fake_tensor.py::FakeTensorTest::test_convert_fake_to_real, test/test_fake_tensor.py::FakeTensorTest::test_cpu_fallback, test/test_fake_tensor.py::FakeTensorTest::test_cuda_initialized, test/test_fake_tensor.py::FakeTensorTest::test_cuda_lstm, test/test_fake_tensor.py::FakeTensorTest::test_cudnn_rnn_with_fallback, test/test_fake_tensor.py::FakeTensorTest::test_cudnn_rnn_without_fallback, test/test_fake_tensor.py::FakeTensorTest::test_custom_op_fallback, test/test_fake_tensor.py::FakeTensorTest::test_data_dependent_operator, test/test_fake_tensor.py::FakeTensorTest::test_deepcopy, test/test_fake_tensor.py::FakeTensorTest::test_device_inplace_copy, test/test_fake_tensor.py::FakeTensorTest::test_embedding_bag_meta, test/test_fake_tensor.py::FakeTensorTest::test_export_numpy, test/test_fake_tensor.py::FakeTensorTest::test_fake_device, test/test_fake_tensor.py::FakeTensorTest::test_fake_dispatch_keys, test/test_fake_tensor.py::FakeTensorTest::test_fake_grad_copy, test/test_fake_tensor.py::FakeTensorTest::test_fake_mode_error, test/test_fake_tensor.py::FakeTensorTest::test_fast_div, test/test_fake_tensor.py::FakeTensorTest::test_fast_div_int_to_float, test/test_fake_tensor.py::FakeTensorTest::test_from_numpy, test/test_fake_tensor.py::FakeTensorTest::test_fsdp_flat_param, test/test_fake_tensor.py::FakeTensorTest::test_full, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_complex128, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_complex64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float32, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fn, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fnuz, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e5m2, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_float8_e5m2fnuz, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int16, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int32, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int64, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_int8, test/test_fake_tensor.py::FakeTensorTest::test_index_cuda_with_cpu_uint8, test/test_fake_tensor.py::FakeTensorTest::test_index_put_error, test/test_fake_tensor.py::FakeTensorTest::test_jagged_fake_to_fake_preserved, test/test_fake_tensor.py::FakeTensorTest::test_like_constructor, test/test_fake_tensor.py::FakeTensorTest::test_mixed_real_and_fake_inputs, test/test_fake_tensor.py::FakeTensorTest::test_mode, test/test_fake_tensor.py::FakeTensorTest::test_nan_to_num, test/test_fake_tensor.py::FakeTensorTest::test_nanmean_out, test/test_fake_tensor.py::FakeTensorTest::test_new, test/test_fake_tensor.py::FakeTensorTest::test_no_tag_func, test/test_fake_tensor.py::FakeTensorTest::test_non_kwarg_device, test/test_fake_tensor.py::FakeTensorTest::test_non_overlapping_stride_zero, test/test_fake_tensor.py::FakeTensorTest::test_non_parameter_grad, test/test_fake_tensor.py::FakeTensorTest::test_normalize_device, test/test_fake_tensor.py::FakeTensorTest::test_op_with_zero_dim_bypassed, test/test_fake_tensor.py::FakeTensorTest::test_out_multi_device, test/test_fake_tensor.py::FakeTensorTest::test_parameter_instantiation, test/test_fake_tensor.py::FakeTensorTest::test_parameter_view, test/test_fake_tensor.py::FakeTensorTest::test_print_in_fake_mode, test/test_fake_tensor.py::FakeTensorTest::test_randperm, test/test_fake_tensor.py::FakeTensorTest::test_recursive_invocation, test/test_fake_tensor.py::FakeTensorTest::test_repr, test/test_fake_tensor.py::FakeTensorTest::test_same_shape_env_preserved, test/test_fake_tensor.py::FakeTensorTest::test_scalar_inputs, test/test_fake_tensor.py::FakeTensorTest::test_scan_reverse_False, test/test_fake_tensor.py::FakeTensorTest::test_scan_reverse_True, test/test_fake_tensor.py::FakeTensorTest::test_setitem, test/test_fake_tensor.py::FakeTensorTest::test_shape_take_not_device, test/test_fake_tensor.py::FakeTensorTest::test_split_return_self, test/test_fake_tensor.py::FakeTensorTest::test_throw, test/test_fake_tensor.py::FakeTensorTest::test_tolist, test/test_fake_tensor.py::FakeTensorTest::test_type_as, test/test_fake_tensor.py::FakeTensorTest::test_unbind_copy_out, test/test_fake_tensor.py::FakeTensorTest::test_unsqueeze_copy, test/test_fake_tensor.py::FakeTensorTest::test_upsample_bilinear_small_channels, test/test_fake_tensor.py::FakeTensorTest::test_zero_dim, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test__adaptive_avg_pool2d_backward_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_alias_call_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_allow_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_copy_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_index_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_aten_slice_scatter_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_basic_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_batch_tensor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_binary_op_type_promotion_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_constructor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_conv_nhwc_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_convert_fake_to_real_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cpu_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cuda_initialized_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cuda_lstm_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cudnn_rnn_with_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_cudnn_rnn_without_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_custom_op_fallback_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_data_dependent_operator_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_deepcopy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_device_inplace_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_embedding_bag_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_export_numpy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_dispatch_keys_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_grad_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fake_mode_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fast_div_int_to_float_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fast_div_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_from_numpy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_fsdp_flat_param_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_full_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_complex128_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_complex64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float32_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fn_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e4m3fnuz_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e5m2_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_float8_e5m2fnuz_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int16_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int32_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int64_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_int8_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_cuda_with_cpu_uint8_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_index_put_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_jagged_fake_to_fake_preserved_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_like_constructor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_mixed_real_and_fake_inputs_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_nan_to_num_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_nanmean_out_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_new_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_no_tag_func_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_kwarg_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_overlapping_stride_zero_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_non_parameter_grad_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_normalize_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_op_with_zero_dim_bypassed_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_out_multi_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_parameter_instantiation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_parameter_view_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_print_in_fake_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_randperm_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_recursive_invocation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_repr_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_same_shape_env_preserved_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scalar_inputs_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scan_reverse_False_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_scan_reverse_True_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_setitem_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_shape_take_not_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_split_return_self_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_throw_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_tolist_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_type_as_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_unbind_copy_out_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_unsqueeze_copy_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_upsample_bilinear_small_channels_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorTest::test_zero_dim_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorConstHandling::test_aliased_const_write, test/test_fake_tensor.py::FakeTensorConstHandling::test_constant_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_constant_propagate_through_functions, test/test_fake_tensor.py::FakeTensorConstHandling::test_fake_tensor_batch_norm_cpu, test/test_fake_tensor.py::FakeTensorConstHandling::test_fake_tensor_in_intlist_repro, test/test_fake_tensor.py::FakeTensorConstHandling::test_inplace_add, test/test_fake_tensor.py::FakeTensorConstHandling::test_inplace_view_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_shared_storage_invalidation, test/test_fake_tensor.py::FakeTensorConstHandling::test_shared_storages, test/test_fake_tensor.py::FakeTensorConstHandling::test_simple, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_aliased_const_write_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_constant_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_constant_propagate_through_functions_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_fake_tensor_batch_norm_cpu_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_fake_tensor_in_intlist_repro_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_inplace_add_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_inplace_view_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_shared_storage_invalidation_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_shared_storages_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConstHandling::test_simple_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyCatCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyCubeCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyMulCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyMulScalarCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyNMSCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyNonzeroCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySortCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySplitCopyCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyTakeCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorOpInfoTestCUDA::test_fake_NumpyViewCopyCustomOp_cuda_float32, test/test_fake_tensor.py::FakeTensorConverterTest::test_dead_key, test/test_fake_tensor.py::FakeTensorConverterTest::test_dead_weak_ref, test/test_fake_tensor.py::FakeTensorConverterTest::test_memoized_conversion_from_meta, test/test_fake_tensor.py::FakeTensorConverterTest::test_memoized_conversion_to_meta, test/test_fake_tensor.py::FakeTensorConverterTest::test_multiple_modes, test/test_fake_tensor.py::FakeTensorConverterTest::test_no_active_mode, test/test_fake_tensor.py::FakeTensorConverterTest::test_no_ref_cycle, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_mode_error, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_tensor_storages_non_view, test/test_fake_tensor.py::FakeTensorConverterTest::test_separate_tensor_storages_view, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_dead_key_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_dead_weak_ref_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_memoized_conversion_from_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_memoized_conversion_to_meta_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_multiple_modes_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_no_active_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_no_ref_cycle_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_mode_error_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_tensor_storages_non_view_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorConverterTest::test_separate_tensor_storages_view_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_conv_c1_backward, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_cross_entropy_loss, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_embedding_bag_private, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_fake_gpu_no_init, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_flash_attention, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_like_ops, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_module_to, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_move_meta_tensor, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_move_module_under_fake, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_no_dispatch_with_like_function, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_non_kwarg_only_device, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_sparse_new, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_str_storage, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_tensor_constructors_all_have_kwarg_device, test/test_fake_tensor.py::FakeTensorOperatorInvariants::test_tensor_new, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_conv_c1_backward_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_cross_entropy_loss_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_embedding_bag_private_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_fake_gpu_no_init_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_flash_attention_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_like_ops_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_module_to_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_move_meta_tensor_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_move_module_under_fake_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_no_dispatch_with_like_function_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_non_kwarg_only_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_sparse_new_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_str_storage_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_tensor_constructors_all_have_kwarg_device_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorOperatorInvariants::test_tensor_new_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorPropTest::test_fake_tensor_prop_on_nn_module, test/test_fake_tensor.py::FakeTensorPropTest::test_fake_tensor_prop_on_nn_module_with_optional_args, test/test_fake_tensor.py::FakeTensorPropTest::test_nan_to_num, test/test_fake_tensor.py::FakeTensorPropTest::test_nonzero_stride, test/test_fake_tensor.py::FakeTensorPropTest::test_torch_load_with_fake_mode, test/test_fake_tensor.py::FakeTensorPropTest::test_unbacked_shape_realloc, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_fake_tensor_prop_on_nn_module_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_fake_tensor_prop_on_nn_module_with_optional_args_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_nan_to_num_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_nonzero_stride_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_torch_load_with_fake_mode_propagate_real_tensors, test/test_fake_tensor.py::PropagateRealTensorsFakeTensorPropTest::test_unbacked_shape_realloc_propagate_real_tensors, test/test_fake_tensor.py::FakeTensorSerialization::test_serialization, test/test_fake_tensor.py::FakeTensorSerialization::test_serialization_with_tracing, test/test_fake_tensor.py::FakeTensorDispatchCache::test__upsample_bilinear2d_aa_backward_dynamic_shapes, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_aten_index, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_bypass, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_default_device, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_default_dtype, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_dispatch_key_set, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_hit, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_inplace_op, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_constants, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_device, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_dtype, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_conj, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_inference, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_is_neg, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_memory_format, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_requires_grad, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_shape, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_storage_offset, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_key_stride, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_tuple_outputs, test/test_fake_tensor.py::FakeTensorDispatchCache::test_cache_view_op, test/test_fake_tensor.py::FakeTensorDispatchCache::test_empty_list, test/test_fake_tensor.py::FakeTensorDispatchCache::test_fft_hfft2_issue145522, test/test_fake_tensor.py::FakeTensorDispatchCache::test_from_buffer, test/test_fake_tensor.py::FakeTensorDispatchCache::test_inference_mode, test/test_fake_tensor.py::FakeTensorDispatchCache::test_invoke_subgraph, test/test_fake_tensor.py::FakeTensorDispatchCache::test_invoke_subgraph_cacheable_inplace, test/test_fake_tensor.py::FakeTensorDispatchCache::test_meta_tensor_to_fake_cpu, test/test_fake_tensor.py::FakeTensorDispatchCache::test_shape_env_settings, test/test_fake_tensor.py::FakeTensorDispatchCache::test_unbacked_output, test/test_fake_tensor.py::FakeTensorDispatchCache::test_wrapper_tensor_subclass_different_device, test/test_fake_tensor.py::FakeTensorPreferDeviceType::test_fake_tensor_prefer_device_type, test/test_fake_tensor.py::FakeTensorPreferDeviceType::test_fake_tensor_prefer_device_type_cpu_only
2025-12-04T13:14:56.7734114Z 
2025-12-04T13:14:56.7734320Z Finished test_fake_tensor 1/1 ... [2025-12-04 13:14:56.758572][15736.686865384], took 0.38min
2025-12-04T13:14:56.7914790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fake_tensor/test_fake_tensor-aa30317e19bcd391.xml
2025-12-04T13:14:56.8838776Z Running test_binary_ufuncs 1/1 ... [2025-12-04 13:14:56.883637][15736.81193457]
2025-12-04T13:14:56.8839228Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:14:56.8842418Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_binary_ufuncs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:14:56.883976]
2025-12-04T13:17:35.6066074Z 
2025-12-04T13:17:35.6067100Z test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_binary_ufuncs_1.1_62e3371632b5f5b8_.log
2025-12-04T13:17:35.9829779Z Running 12917 items in this shard: test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___add___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___and___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___eq___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___floordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ge___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___gt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iadd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___iand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ifloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ilshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___imul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ior___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ipow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___irshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___isub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___itruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ixor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___le___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___lt___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___matmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___mul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ne___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___or___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___pow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___radd___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rand___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rfloordiv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rlshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmatmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmod___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rmul___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___ror___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rpow___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rrshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rshift___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rsub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rtruediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___rxor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___sub___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___truediv___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test___xor___not_implemented_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_broadcast_empty_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_add_with_tail_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addcmul_scalars_as_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_addsub_half_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_atan2_edgecases_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_batch_vs_slicing_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_op_scalar_device_unspecified_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_binary_ops_with_scalars_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bitwise_ops_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_bool_tensor_comparison_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_broadcasting_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cmul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_comparison_ops_type_promotion_and_broadcasting_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_div_underflow_overflow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_complex_scalar_pow_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_large_dim_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_size1_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_every_other_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_contig_vs_transposed_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_copysign_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cpu_tensor_pow_cuda_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cremainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_binary_ops_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cross_device_inplace_error_msg_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_csub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cuda_tensor_pow_scalar_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_cumulative_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_script_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_and_floordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_modes_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_nonfinite_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_div_rounding_numpy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divide_by_zero_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_divmul_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_power_exceptions_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_float_scalar_pow_float_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_div_extremal_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_int_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_tensor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_floor_divide_zero_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_by_zero_integral_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_fmod_remainder_overflow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_heaviside_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_idiv_and_ifloordiv_vs_python_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_division_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_inplace_dunders_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_and_float_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_int_tensor_pow_neg_ints_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cpu_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_lowp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_scalar_tensor_promotion_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_lerp_weight_tensor_promotion_error_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_and_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_or_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_logical_xor_with_nontrivial_alignment_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_long_tensor_pow_floats_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_and_minimum_subgradient_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex128_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_complex_cuda_complex64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_cross_device_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_float_nan_and_inf_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_forward_ad_float32_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_int_and_bool_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bfloat16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_maximum_minimum_type_promotion_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_min_max_binary_op_nan_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_chalf_tensor_and_cpu_scalar_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_mul_intertype_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_muldiv_scalar_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_bfloat16_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_expand_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___radd___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rand___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rdiv___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmod___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rmul___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___ror___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rpow___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rsub___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index___rxor___cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs__conversions_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index__refs_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_atan2_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_left_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_right_shift_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_complex_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_copysign_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_floor_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_no_rounding_mode_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_div_trunc_rounding_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmax_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmin_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_hypot_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igamma_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_igammac_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_index_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ldexp_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logaddexp_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_mul_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_nextafter_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_polar_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_rsub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_h_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_hermite_polynomial_he_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_laguerre_polynomial_l_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_legendre_polynomial_p_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_xlog1py_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_special_zeta_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_true_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_non_contig_xlogy_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___radd___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rdiv___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmod___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rmul___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rpow___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable___rsub___cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs__conversions_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable__refs_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_atan2_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_complex_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_copysign_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_floor_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_no_rounding_mode_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_div_trunc_rounding_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmax_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmin_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_hypot_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igamma_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_igammac_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ldexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logaddexp_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_mul_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_nextafter_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_polar_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_rsub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_h_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_hermite_polynomial_he_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_laguerre_polynomial_l_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_legendre_polynomial_p_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_xlog1py_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_special_zeta_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_true_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_not_broadcastable_xlogy_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_out_resize_warning_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_complex_extremal_passing_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_inplace_resizing_exception_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_base_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_overloads_mem_overlap_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_pow_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rdiv_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_extremal_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_large_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values__refs_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_add_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_bitwise_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_max_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_clamp_min_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_eq_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_float_power_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_floor_divide_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_fmod_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gcd_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ge_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_gt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_heaviside_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_isclose_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_jiterator_binary_return_by_ref_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lcm_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_le_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_and_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_or_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_logical_xor_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_lt_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_max_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_maximum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_min_binary_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_minimum_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_ne_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_pow_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_remainder_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_small_values_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_reference_numerics_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_fmod_large_dividend_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_remainder_overflow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_rpow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support__refs_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_add_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_bitwise_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_max_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_clamp_min_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_eq_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_float_power_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_floor_divide_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_fmod_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gcd_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ge_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_gt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_heaviside_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_isclose_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_jiterator_binary_return_by_ref_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lcm_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_le_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_and_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_or_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_logical_xor_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_lt_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_max_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_maximum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_min_binary_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_minimum_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_ne_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_pow_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_remainder_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_scalar_support_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_shift_limits_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_signed_shift_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex128, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_complex64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_cuda_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_sub_typing_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_tensor_pow_tensor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_trapezoid_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_bfloat16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_true_divide_out_cuda_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___radd___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rand___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rdiv___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmod___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rmul___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___ror___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rpow___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rsub___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion___rxor___cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs__conversions_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion__refs_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_add_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_atan2_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_left_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_right_shift_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_bitwise_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_max_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_clamp_min_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_complex_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_copysign_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_floor_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_no_rounding_mode_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_div_trunc_rounding_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_eq_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_float_power_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_floor_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmax_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmin_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_fmod_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gcd_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ge_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_gt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_heaviside_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_hypot_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igamma_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_igammac_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_isclose_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_jiterator_binary_return_by_ref_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lcm_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ldexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_le_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logaddexp_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_and_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_or_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_logical_xor_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_lt_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_max_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_maximum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_min_binary_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_minimum_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_mul_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_ne_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_nextafter_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_polar_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_pow_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_remainder_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_rsub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_h_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_hermite_polynomial_he_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_laguerre_polynomial_l_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_legendre_polynomial_p_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_t_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_u_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_v_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_shifted_chebyshev_polynomial_w_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_xlog1py_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_special_zeta_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_sub_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_true_divide_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_type_promotion_xlogy_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_bfloat16_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_cuda_uint8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_gradients_cuda_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_xlogy_xlog1py_scalar_type_promotion_cuda, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_bool_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_float64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int16_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int32_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int64_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_int8_uint8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_bool, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_float64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int16, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int32, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int64, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_int8, test/test_binary_ufuncs.py::TestBinaryUfuncsCUDA::test_zeta_cuda_uint8_uint8
2025-12-04T13:17:36.3341375Z 
2025-12-04T13:17:36.3341765Z Finished test_binary_ufuncs 1/1 ... [2025-12-04 13:17:35.624056][15895.552345694], took 2.65min
2025-12-04T13:17:36.3343037Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-6264828c6395375f.xml
2025-12-04T13:17:36.3344190Z Running test_meta 2/4 ... [2025-12-04 13:17:35.927331][15895.8556273]
2025-12-04T13:17:36.3344687Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:17:36.3345889Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:17:35.927699]
2025-12-04T13:45:51.4353286Z 
2025-12-04T13:45:51.4354368Z test_meta 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_2.4_5af3c72f6896ae42_.log
2025-12-04T13:45:51.6815292Z Running 10352 items in this shard: test/test_meta.py::TestMetaConverter::test_channels_last, test/test_meta.py::TestMetaConverter::test_channels_last_leaf, test/test_meta.py::TestMetaConverter::test_view_dtype, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs__conversions_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs__conversions_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___ror___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vector_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__native_batch_norm_legit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_tensorsolve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_multinomial_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_ctc_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triangular_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_lengths_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_div_trunc_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_solve_triangular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mH_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_trace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_true_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bernoulli_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bucketize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_power_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_tensorsolve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nextafter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_similarity_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensordot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcdiv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_dstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expand_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bicubic_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_upsample_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_searchsorted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_square_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cauchy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flip_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_inner_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_tensorinv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ne_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_hann_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_group_norm_backward_output_mask1_cuda, test/test_meta.py::TestMetaCUDA::test_inplace_bin_ops_error_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___getitem___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rxor___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cauchy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_inverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_float8_e5m2fnuz, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gcd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_put_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_householder_product_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_singular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logcumsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lu_unpack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nansum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_gelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_one_hot_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_renorm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_laguerre_polynomial_l_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_T_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rxor___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_lengths_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__segment_reduce_offsets_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmm_decomposed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_baddbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bincount_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cauchy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_multi_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svdvals_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorsolve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matrix_exp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmedian_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_nuc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_number_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_neg_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_kaiser_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_kaiser_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_h_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_uint64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_stride_for_index_Tensor_cuda
﻿2025-12-04T13:45:51.9208215Z 
2025-12-04T13:45:51.9208419Z Finished test_meta 2/4 ... [2025-12-04 13:45:51.446836][17591.375127138], took 28.26min
2025-12-04T13:45:51.9209048Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_meta/test_meta-be563441d2ad6907.xml
2025-12-04T13:45:53.0403011Z Uploading artifacts took 1.29 seconds
2025-12-04T13:45:53.0406602Z Running test_fx 1/1 ... [2025-12-04 13:45:53.040472][17592.968768044]
2025-12-04T13:45:53.0406944Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:45:53.0410872Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:45:53.040865]
2025-12-04T13:48:51.7397932Z 
2025-12-04T13:48:51.7398652Z test_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_1.1_c9e7dc6459df6851_.log
2025-12-04T13:48:51.7726290Z Running 1280 items in this shard: test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cuda, test/test_fx.py::TestCSEPass::test_banned_list, test/test_fx.py::TestCSEPass::test_empty, test/test_fx.py::TestCSEPass::test_immutable_list_multiple_entries, test/test_fx.py::TestCSEPass::test_immutable_list_type, test/test_fx.py::TestCSEPass::test_kwarg, test/test_fx.py::TestCSEPass::test_nested_immutable_list_type, test/test_fx.py::TestCSEPass::test_nochange, test/test_fx.py::TestCSEPass::test_rand_like, test/test_fx.py::TestCSEPass::test_rand_n, test/test_fx.py::TestCSEPass::test_random, test/test_fx.py::TestCSEPass::test_simple, test/test_fx.py::TestCSEPass::test_simple_2, test/test_fx.py::TestCSEPass::test_simple_multiple_same_ops, test/test_fx.py::TestCSEPass::test_two_args, test/test_fx.py::TestCSEPass::test_two_args_default, test/test_fx.py::TestDCE::test_dead_chain, test/test_fx.py::TestDCE::test_dead_getattr, test/test_fx.py::TestDCE::test_dead_placeholder, test/test_fx.py::TestDCE::test_dead_placeholder_with_user, test/test_fx.py::TestDCE::test_impure_custom, test/test_fx.py::TestDCE::test_impure_kwargs, test/test_fx.py::TestDCE::test_impure_nodes_args, test/test_fx.py::TestDCE::test_impure_random, test/test_fx.py::TestDCE::test_keep_collectives, test/test_fx.py::TestDCE::test_keep_collectives_no_overload, test/test_fx.py::TestDCE::test_keep_module_with_side_effects, test/test_fx.py::TestDCE::test_keep_setitem, test/test_fx.py::TestDCE::test_keep_torch_assert, test/test_fx.py::TestDCE::test_simple, test/test_fx.py::TestConstFold::test_check_inline_non_const, test/test_fx.py::TestConstFold::test_check_inline_non_const_mult_return, test/test_fx.py::TestConstFold::test_check_skip_folding_quant_dequant_pattern, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_no_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_placeholder_reordered, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr_three_input, test/test_fx.py::TestConstFold::test_const_fold_has_inlined_call_module_node, test/test_fx.py::TestConstFold::test_const_fold_module_attr, test/test_fx.py::TestConstFold::test_const_fold_multi_const_folded_attrs, test/test_fx.py::TestConstFold::test_const_fold_noop, test/test_fx.py::TestConstFold::test_const_fold_partial_graph, test/test_fx.py::TestConstFold::test_const_fold_submod_hierarchy, test/test_fx.py::TestConstFold::test_const_fold_tensor_meta, test/test_fx.py::TestConstFold::test_const_fold_unused_placeholder, test/test_fx.py::TestConstFold::test_dict_output, test/test_fx.py::TestConstFold::test_do_not_fold_impure_subgraph, test/test_fx.py::TestConstFold::test_fold_module, test/test_fx.py::TestConstFold::test_fold_pure_subgraph, test/test_fx.py::TestConstFold::test_retain_node_meta, test/test_fx.py::TestConstFold::test_three_outputs, test/test_fx.py::TestConstFold::test_two_outputs, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_dim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_ndim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_nelement_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_numel_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_shape_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_size_const, test/test_fx.py::AnnotationsTest::test_annotate, test/test_fx.py::AnnotationsTest::test_annotations, test/test_fx.py::AnnotationsTest::test_broadcasting1, test/test_fx.py::AnnotationsTest::test_broadcasting2, test/test_fx.py::AnnotationsTest::test_broadcasting3, test/test_fx.py::AnnotationsTest::test_consistency, test/test_fx.py::AnnotationsTest::test_precision, test/test_fx.py::TypeCheckerTest::test_flatten_fully_static, test/test_fx.py::TypeCheckerTest::test_resnet50, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast_2, test/test_fx.py::TypeCheckerTest::test_type_check_add_false, test/test_fx.py::TypeCheckerTest::test_type_check_add_true, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_scalar, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_false, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_symbolic, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2_fully_static, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_types, test/test_fx.py::TypeCheckerTest::test_type_check_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_flatten3, test/test_fx.py::TypeCheckerTest::test_type_check_flatten_2, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true_param_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_true, test/test_fx.py::TypeCheckerTest::test_type_check_symbolic_inferenceconv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_False, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_true, test/test_fx.py::TypeCheckerTest::test_type_maxpool2d_fully_static, test/test_fx.py::TypeCheckerTest::test_type_typechecl_maxpool2d_3dinput, test/test_fx.py::TypeCheckerTest::test_typecheck_basicblock, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_function, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_module, test/test_fx.py::TestMatcher::test_split_to_graph_and_name_node_map, test/test_fx.py::TestMatcher::test_subgraph_matcher_ignore_literals, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_attributes, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list_bad, test/test_fx.py::TestMatcher::test_variatic_arg_matching, test/test_fx.py::TestPassManager::test_pass_manager, test/test_fx.py::TestPassManager::test_pass_manager_bad_checks, test/test_fx.py::TestPassManager::test_pass_manager_checks, test/test_fx.py::TestPassManager::test_pass_manager_error, test/test_fx.py::TestPassManager::test_this_before_that_pass_constraint, test/test_fx.py::TestPassManager::test_topological_sort, test/test_fx.py::TestSourceMatcher::test_legalize_slice, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_True, test/test_fx.py::TestSubgraphRewriter::test_matching_pattern_with_list_type_arg, test/test_fx.py::TestSubgraphRewriter::test_matching_variable_arguments, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_callback, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_filters, test/test_fx.py::TestSubgraphRewriter::test_replaced_nodes, test/test_fx.py::TestSubgraphRewriter::test_replacement_with_attrs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_annotations_int, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_call_method, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_internal_pattern_nodes_cannot_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_local_revert, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_multiple_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_nodes_with_kwargs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_is_entire_graph, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_output_pattern_node_can_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_placeholder_matching, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_preserves_logic, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_consecutive_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_duplicated_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_multiple_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replaces_referenced_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_single_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_traced_as_callable, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_oneliner_pattern, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_overlapping_matches, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_trivial_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_args, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_results, test/test_fx.py::TestFX::test_all_input_nodes, test/test_fx.py::TestFX::test_annotation_with_future, test/test_fx.py::TestFX::test_annotations_empty_tuple, test/test_fx.py::TestFX::test_annotations_with_forward_references, test/test_fx.py::TestFX::test_annotations_with_no_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_internal_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_no_internal_forward_references, test/test_fx.py::TestFX::test_args_kwargs, test/test_fx.py::TestFX::test_args_kwargs_no_self, test/test_fx.py::TestFX::test_ast_rewriter_reassigns_submodules, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert_with_message, test/test_fx.py::TestFX::test_ast_rewriter_wrap, test/test_fx.py::TestFX::test_ast_rewriter_wrap_fn_directly, test/test_fx.py::TestFX::test_ast_rewriter_wrap_with_submodule, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_autowrap_functions, test/test_fx.py::TestFX::test_concrete_arg_none_assert, test/test_fx.py::TestFX::test_construct_root_dict, test/test_fx.py::TestFX::test_control_flow_tracing, test/test_fx.py::TestFX::test_copy_it, test/test_fx.py::TestFX::test_copy_no_remap, test/test_fx.py::TestFX::test_ctx_mgr, test/test_fx.py::TestFX::test_custom_codegen, test/test_fx.py::TestFX::test_custom_codegen_with_transformer, test/test_fx.py::TestFX::test_custom_import, test/test_fx.py::TestFX::test_custom_proxy_dynamic_value, test/test_fx.py::TestFX::test_custom_proxy_input_dependent_control_flow, test/test_fx.py::TestFX::test_custom_proxy_type, test/test_fx.py::TestFX::test_custom_proxy_type_literal, test/test_fx.py::TestFX::test_custom_traceback_not_raised_when_exception_source_is_submodule, test/test_fx.py::TestFX::test_custom_traceback_raised_when_exception_source_is_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graph_with_tracer_cls, test/test_fx.py::TestFX::test_deepcopy_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graphmodule_with_transform, test/test_fx.py::TestFX::test_deepcopy_no_recursion, test/test_fx.py::TestFX::test_deepcopy_recursion_depth, test/test_fx.py::TestFX::test_deepcopy_tracer, test/test_fx.py::TestFX::test_deepcopy_with_submods_params, test/test_fx.py::TestFX::test_delete_unused_submodules_leaf, test/test_fx.py::TestFX::test_delete_unused_values, test/test_fx.py::TestFX::test_dict, test/test_fx.py::TestFX::test_direct_param_use, test/test_fx.py::TestFX::test_disallow_override, test/test_fx.py::TestFX::test_ellipsis, test/test_fx.py::TestFX::test_empty_graph_codegen, test/test_fx.py::TestFX::test_enum, test/test_fx.py::TestFX::test_erase_node_error, test/test_fx.py::TestFX::test_example_shape_prop, test/test_fx.py::TestFX::test_find_uses, test/test_fx.py::TestFX::test_fn_type_annotation_empty, test/test_fx.py::TestFX::test_fn_type_annotations, test/test_fx.py::TestFX::test_fx_and_or, test/test_fx.py::TestFX::test_fx_create_arg, test/test_fx.py::TestFX::test_fx_shifts, test/test_fx.py::TestFX::test_fx_stateless, test/test_fx.py::TestFX::test_get_torch_func_signature, test/test_fx.py::TestFX::test_getitem, test/test_fx.py::TestFX::test_getitem_subproc, test/test_fx.py::TestFX::test_graph_edit_with_proxy, test/test_fx.py::TestFX::test_graph_fns, test/test_fx.py::TestFX::test_graph_module, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_dict_init, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_mod_init, test/test_fx.py::TestFX::test_graph_module_replicate_for_dp, test/test_fx.py::TestFX::test_graph_unique_names, test/test_fx.py::TestFX::test_graph_unique_names_manual, test/test_fx.py::TestFX::test_immutable_dict_pytree_ops, test/test_fx.py::TestFX::test_immutable_list_pytree_ops, test/test_fx.py::TestFX::test_imul_code_print, test/test_fx.py::TestFX::test_inf_nan, test/test_fx.py::TestFX::test_inf_nan_kwds, test/test_fx.py::TestFX::test_informative_co_filename, test/test_fx.py::TestFX::test_inline_graph, test/test_fx.py::TestFX::test_insert_arg, test/test_fx.py::TestFX::test_insertion_point, test/test_fx.py::TestFX::test_interpreter, test/test_fx.py::TestFX::test_interpreter_boxed_run_argument_validation, test/test_fx.py::TestFX::test_interpreter_default_args, test/test_fx.py::TestFX::test_interpreter_gc_values, test/test_fx.py::TestFX::test_interpreter_noop_resnet18, test/test_fx.py::TestFX::test_interpreter_not_enough_args, test/test_fx.py::TestFX::test_interpreter_onthefly_swap, test/test_fx.py::TestFX::test_interpreter_other_graph, test/test_fx.py::TestFX::test_interpreter_partial_eval, test/test_fx.py::TestFX::test_interpreter_run_node_override, test/test_fx.py::TestFX::test_interpreter_star_args, test/test_fx.py::TestFX::test_interpreter_with_codegen, test/test_fx.py::TestFX::test_layout, test/test_fx.py::TestFX::test_leaf_module, test/test_fx.py::TestFX::test_lineno_map, test/test_fx.py::TestFX::test_matmul_tracing, test/test_fx.py::TestFX::test_metadata_on_ph, test/test_fx.py::TestFX::test_module_deepcopy_edit_nodes, test/test_fx.py::TestFX::test_move_before, test/test_fx.py::TestFX::test_multi_insert_point, test/test_fx.py::TestFX::test_multiple_default_args, test/test_fx.py::TestFX::test_named_tuple_inlined, test/test_fx.py::TestFX::test_namedtuple_return_qualname, test/test_fx.py::TestFX::test_namedtuple_return_trace, test/test_fx.py::TestFX::test_native_callable, test/test_fx.py::TestFX::test_nn_module_stack, test/test_fx.py::TestFX::test_no_mutation, test/test_fx.py::TestFX::test_node_tagging, test/test_fx.py::TestFX::test_nonetype_annotation, test/test_fx.py::TestFX::test_partial_trace, test/test_fx.py::TestFX::test_pickle_custom_import, test/test_fx.py::TestFX::test_pickle_graphmodule, test/test_fx.py::TestFX::test_pickle_nonetype_annotation, test/test_fx.py::TestFX::test_pickle_torch_custom_ops, test/test_fx.py::TestFX::test_prepend_does_not_leak, test/test_fx.py::TestFX::test_prepend_self, test/test_fx.py::TestFX::test_pretty_print, test/test_fx.py::TestFX::test_pretty_print_graph, test/test_fx.py::TestFX::test_pretty_print_node, test/test_fx.py::TestFX::test_pretty_print_targets, test/test_fx.py::TestFX::test_print_graph, test/test_fx.py::TestFX::test_profiler_multiple_modules, test/test_fx.py::TestFX::test_profiler_nested_graph_modules, test/test_fx.py::TestFX::test_profiler_ranges_side_effect, test/test_fx.py::TestFX::test_profiler_stack_trace_augmentation, test/test_fx.py::TestFX::test_proxy_deepcopy_with_tracer, test/test_fx.py::TestFX::test_proxy_deepcopy_without_tracer, test/test_fx.py::TestFX::test_pytree, test/test_fx.py::TestFX::test_pytree_concrete, test/test_fx.py::TestFX::test_reassign_args_kwargs_uses, test/test_fx.py::TestFX::test_regular_and_default_args, test/test_fx.py::TestFX::test_remove_uses, test/test_fx.py::TestFX::test_remove_uses_with_custom_filter, test/test_fx.py::TestFX::test_replace_input, test/test_fx.py::TestFX::test_replace_uses, test/test_fx.py::TestFX::test_reserved_getattr, test/test_fx.py::TestFX::test_return_tuple, test/test_fx.py::TestFX::test_return_type_exists, test/test_fx.py::TestFX::test_return_type_exists_pre_pep585, test/test_fx.py::TestFX::test_script_method_trace, test/test_fx.py::TestFX::test_script_tensor_constant, test/test_fx.py::TestFX::test_sequential, test/test_fx.py::TestFX::test_shape_prop_aggregate, test/test_fx.py::TestFX::test_shape_prop_layout, test/test_fx.py::TestFX::test_shape_prop_layout_3d, test/test_fx.py::TestFX::test_shape_prop_unbacked_sym, test/test_fx.py::TestFX::test_single_default_arg, test/test_fx.py::TestFX::test_snake_case, test/test_fx.py::TestFX::test_sqrt, test/test_fx.py::TestFX::test_stack_traces, test/test_fx.py::TestFX::test_stack_traces_with_transformer, test/test_fx.py::TestFX::test_string_literal_return, test/test_fx.py::TestFX::test_submodule_manipulation_API, test/test_fx.py::TestFX::test_symbolic_trace_assert, test/test_fx.py::TestFX::test_symbolic_trace_sequential, test/test_fx.py::TestFX::test_tensor_attribute, test/test_fx.py::TestFX::test_tensor_attribute_coalseced, test/test_fx.py::TestFX::test_tensor_constant, test/test_fx.py::TestFX::test_throw_out_variant, test/test_fx.py::TestFX::test_torch_custom_ops, test/test_fx.py::TestFX::test_torch_fx_getattr, test/test_fx.py::TestFX::test_torch_fx_len, test/test_fx.py::TestFX::test_torch_op_overloads, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx_tensor_arg, test/test_fx.py::TestFX::test_trace_buffer_slice, test/test_fx.py::TestFX::test_trace_dict_int_keys, test/test_fx.py::TestFX::test_trace_dict_proxy_keys, test/test_fx.py::TestFX::test_trace_fn_constant, test/test_fx.py::TestFX::test_trace_function, test/test_fx.py::TestFX::test_trace_multiple_funcs, test/test_fx.py::TestFX::test_trace_return_dataclass, test/test_fx.py::TestFX::test_trace_return_dataclass_nested, test/test_fx.py::TestFX::test_trace_return_namedtuple, test/test_fx.py::TestFX::test_tracing_graphmodules_as_leaf_submodules, test/test_fx.py::TestFX::test_transformer_multi_outputs, test/test_fx.py::TestFX::test_transformer_noop, test/test_fx.py::TestFX::test_transformer_op_swap, test/test_fx.py::TestFX::test_transformer_preserves_nn_module_stack_for_get_attr, test/test_fx.py::TestFX::test_tuple_no_subscript, test/test_fx.py::TestFX::test_typename_print, test/test_fx.py::TestFX::test_typename_print_pre_pep585, test/test_fx.py::TestFX::test_typename_print_union, test/test_fx.py::TestFX::test_unpack, test/test_fx.py::TestFX::test_unpack_dict_better_error, test/test_fx.py::TestFX::test_unpack_list_better_error, test/test_fx.py::TestFX::test_update_args_api, test/test_fx.py::TestFX::test_update_args_kwargs_yells_at_you, test/test_fx.py::TestFX::test_update_kwargs_api, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_function, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_module, test/test_fx.py::TestFX::test_varargs_concrete, test/test_fx.py::TestFX::test_wrap, test/test_fx.py::TestFX::test_wrap_decorated_function, test/test_fx.py::TestFX::test_wrap_fn_directly, test/test_fx.py::TestFX::test_wrap_with_submodule, test/test_fx.py::TestFX::test_wrapped_method, test/test_fx.py::TestFX::test_wrapped_retrace, test/test_fx.py::TestFX::test_wrapped_via_decorator, test/test_fx.py::TestFX::test_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_wrong_target_type, test/test_fx.py::TestFX::test_wrong_topo, test/test_fx.py::TestFXAPIBackwardCompatibility::test_adding_side_effect_function, test/test_fx.py::TestFXAPIBackwardCompatibility::test_class_member_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_function_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_preserve_unused_attr_after_unpickle, test/test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_affine_grid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_batch_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy_with_logits, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_tbc, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_similarity, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_ctc_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding_bag, test/test_fx.py::TestFunctionalTracing::test_nn_functional_feature_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gaussian_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_glu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grid_sample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_group_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gumbel_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardswish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hinge_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_huber_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_instance_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_interpolate, test/test_fx.py::TestFunctionalTracing::test_nn_functional_kl_div, test/test_fx.py::TestFunctionalTracing::test_nn_functional_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_layer_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_linear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_local_response_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_log_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_logsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_margin_ranking_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mse_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_head_attention_forward, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_native_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_normalize, test/test_fx.py::TestFunctionalTracing::test_nn_functional_one_hot, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pad, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pairwise_distance, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pdist, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_unshuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_poisson_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_prelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu6, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rms_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_dot_product_attention, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_silu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_smooth_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmin, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softplus, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_with_distance_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_unfold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_nearest, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_H_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_T_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___getitem___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___radd___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rdiv___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmatmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmod___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rpow___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rsub___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__chunk_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__softmax_backward_data_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__upsample_bilinear2d_aa_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_abs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcdiv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_decomposed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_alias_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_all_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_allclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_aminmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_angle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_any_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_arange_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argsort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argwhere_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_baddbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bernoulli_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bfloat16_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_block_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bool_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_shapes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bucketize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_byte_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cartesian_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cauchy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdouble_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ceil_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cfloat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chalf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_char_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_inverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_max_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_min_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clone_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_column_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_combinations_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_physical_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_constant_pad_nd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_contiguous_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_copysign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_corrcoef_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_count_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cov_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumulative_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_deg2rad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_embed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagflat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diff_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_digamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_floor_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_double_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_einsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_permuted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_equal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expm1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eye_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flip_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fliplr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flipud_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gather_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ge_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geometric_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geqrf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gradient_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_half_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hash_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_heaviside_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_histc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hypot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igammac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_inner_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_int_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isfinite_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isnan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isneginf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isposinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isreal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_item_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_2inputs_2outputs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_unary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kron_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kthvalue_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ldexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_le_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lerp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lgamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cond_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_det_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eig_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_householder_product_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_multi_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_slogdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svdvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vander_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vecdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vector_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log10_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log1p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logcumsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_and_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_not_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_or_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_xor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_long_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_unpack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mH_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mT_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matrix_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_pool2d_with_indices_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_maximum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_minimum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_movedim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_msort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_multinomial_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nan_to_num_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmedian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanquantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nansum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_dropout_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ne_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nextafter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_celu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_similarity_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_ctc_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_elu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_glu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardswish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_logsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mse_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_head_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_negative_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_selu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_silu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_static_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_fro_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_inf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_nuc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_in_place_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_number_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ormqr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_outer_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pca_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pinverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polar_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_positive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_quantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rad2deg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rand_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ravel_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_real_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reciprocal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_remainder_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_renorm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_interleave_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize_as__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_roll_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rot90_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scalar_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_searchsorted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sgn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_short_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_bartlett_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hann_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signbit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_mm_reduce_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_airy_ai_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_entr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_erfcx_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i0e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_log_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtri_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_spherical_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_xlog1py_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_zeta_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_list_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_square_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_multiple_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_to_size_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_along_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensor_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensordot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_sparse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_topk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapz_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triangular_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tril_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_true_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trunc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unflatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_uniform_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_consecutive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_where_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_xlogy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zero__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_like_cuda_float32, test/test_fx.py::TestVisionTracing::test_torchvision_models_alexnet, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_base, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_tiny, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet121, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet161, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet169, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet201, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_320_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fcos_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_keypointrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssd300_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssdlite320_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b0, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b1, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b2, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b3, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b4, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b5, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b6, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b7, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_l, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_m, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_googlenet, test/test_fx.py::TestVisionTracing::test_torchvision_models_inception_v3, test/test_fx.py::TestVisionTracing::test_torchvision_models_maxvit_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_75, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_3, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_128gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet152, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet18, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet34, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_32x8d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_64x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext50_32x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_lraspp_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x2_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_1, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mc3_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v1_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r2plus1d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r3d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_s3d, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_h_14, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet101_2, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet50_2
2025-12-04T13:48:51.8041002Z 
2025-12-04T13:48:51.8041184Z Finished test_fx 1/1 ... [2025-12-04 13:48:51.741234][17771.669530763], took 2.98min
2025-12-04T13:48:51.8041784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fx/test_fx-26e481d16f3bec04.xml
2025-12-04T13:48:51.9046645Z Running test_ops_gradients 2/4 ... [2025-12-04 13:48:51.904441][17771.832739937]
2025-12-04T13:48:51.9047069Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:48:51.9049930Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_gradients.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:48:51.904759]
2025-12-04T13:54:03.2137401Z 
2025-12-04T13:54:03.2138311Z test_ops_gradients 2/4 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_gradients_2.4_85b1c0ac8f503e20_.log
2025-12-04T13:54:03.2501387Z Running 1350 items in this shard: test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyMulScalarCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_arange_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_argwhere_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_block_diag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cdouble_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ifftshift_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_ihfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_irfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_fmod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_frexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_full_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_gather_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_geqrf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_igammac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_index_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_inner_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isnan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_isneginf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_kron_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ldexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lgamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_cond_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_det_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_lu_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_solve_triangular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_linalg_vecdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logcumsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_argmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_softmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nan_to_num_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanmedian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_glu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_hardshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_mish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pairwise_distance_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pairwise_distance_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_selu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_tanhshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nn_functional_upsample_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_norm_nuc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ormqr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_polar_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_pow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_prod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randint_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_searchsorted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_select_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_general_hamming_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sinh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_j1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_log_ndtr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_modified_bessel_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_take_along_dim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unbind_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_fail_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpySplitCopyWithIntCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_NumpyViewCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___radd___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmatmul___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__segment_reduce_lengths_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcdiv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addcmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addmv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_as_strided_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_atleast_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_bool_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cfloat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_chalf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_clone_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cond_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_corrcoef_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_count_nonzero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diag_embed_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_dsplit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_equal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_hfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ifftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_ihfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fft_rfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fliplr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_fmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_full_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_geqrf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hash_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_histc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_hstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_item_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_jiterator_unary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_inv_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_lu_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_rank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linalg_vector_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_linspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_log_softmax_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_long_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_cumprod_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_masked_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_min_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_minimum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_movedim_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nanquantile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_new_zeros_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_alpha_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_celu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_kl_div_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_linear_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pad_replicate_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rms_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_rrelu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_polygamma_polygamma_n_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_pow_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_real_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_remainder_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_renorm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_round_decimals_neg_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sinc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_entr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_i1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_squeeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_sub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tensordot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trace_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_transpose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_triu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_true_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_trunc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_while_loop_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_grad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpyNMSCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_NumpySortCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_T_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rmul___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__native_batch_norm_legit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_abs_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_acos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_allclose_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_angle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_as_strided_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_baddbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bfloat16_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_block_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_bucketize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cauchy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ceil_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cfloat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_char_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_max_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clamp_min_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_clone_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_column_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_combinations_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_constant_pad_nd_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumulative_trapezoid_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_dstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_equal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expand_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_exponential_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_eye_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_fftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_fmin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_gather_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_hstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_igamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_igammac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_imag_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_invoke_subgraph_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isposinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_isreal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_item_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_kthvalue_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lerp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_cholesky_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_eigvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lstsq_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_pinv_singular_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_tensorinv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_linalg_vector_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log1p_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_and_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_logit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_lu_unpack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_map_triple_nested_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_masked_std_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_matmul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_max_binary_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_max_pool2d_with_indices_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_meshgrid_variadic_tensors_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nansum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_full_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_new_zeros_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_dropout_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_elu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_embedding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_interpolate_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_constant_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pdist_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_silu_complex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softplus_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_softshrink_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_nonzero_static_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_norm_fro_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ones_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_permute_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_polygamma_polygamma_n_2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_qr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rad2deg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reciprocal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_renorm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_repeat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scalar_tensor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scan_simple_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sgn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_signal_windows_nuttall_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_bessel_y1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_hermite_polynomial_he_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_legendre_polynomial_p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_i0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_modified_bessel_k1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_special_spherical_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_squeeze_multiple_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_stack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_std_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_stft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_tile_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_topk_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_transpose_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triangular_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unbind_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unflatten_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unflatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unfold_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_var_mean_unbiased_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_vstack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_xlogy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_fn_gradgrad_zeros_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_H_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpyMulCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_NumpySplitCopyCustomOp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___getitem___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__segment_reduce_offsets_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad__upsample_bilinear2d_aa_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_abs_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_acosh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addbmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addcdiv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addmv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_addr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_alias_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_all_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_angle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_any_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_argwhere_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asin_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_asinh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_atleast_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bernoulli_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bfloat16_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_broadcast_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_byte_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cfloat_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chalf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_conj_physical_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_copysign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_corrcoef_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_count_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cross_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cummin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_cumulative_trapezoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diagonal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_floor_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_no_rounding_mode_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_div_trunc_rounding_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_double_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_dsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_erf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_erfc_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expm1_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_expm1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_fftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfftn_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_ifftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_irfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flip_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_fliplr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_flipud_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_float_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_frac_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ge_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_geometric_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_grid_sampler_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_half_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_hstack_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_index_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_int_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isfinite_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isinf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isinf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_isneginf_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_binary_return_by_ref_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_jiterator_unary_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_le_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lerp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_eigvalsh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_inv_ex_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_ldl_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_lu_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_multi_dot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_slogdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_slogdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_svdvals_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorinv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_linalg_tensorsolve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log10_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_normal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_log_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logdet_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_not_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logical_xor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_logspace_tensor_overload_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mT_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_logsumexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_select_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_select_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_masked_var_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_meshgrid_list_of_tensors_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_min_reduction_with_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_msort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mul_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_narrow_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_native_batch_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ne_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_new_empty_strided_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nextafter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_avg_pool1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_channel_shuffle_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_cross_entropy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_ctc_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_embedding_bag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_grid_sample_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_hardswish_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_huber_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_interpolate_linear_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_leaky_relu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_multi_head_attention_forward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pad_replicate_negative_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_relu6_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_silu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_soft_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_nonzero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ormqr_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_outer_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pca_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_permute_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_pinverse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_positive_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rand_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rand_like_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_randn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_ravel_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reciprocal_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_reshape_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_conj_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_resolve_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_roll_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rot90_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_rsub_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scalar_tensor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_blackman_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signal_windows_gaussian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_slice_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_bessel_j0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i0e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_i1e_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_laguerre_polynomial_l_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_split_with_sizes_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_square_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_std_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_take_along_dim_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tan_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensor_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_tensordot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_transpose_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_triu_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_true_divide_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_consecutive_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unique_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_mean_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_as_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_vsplit_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_grad_where_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_H_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rdiv___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rmod___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rpow___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rsub___cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad___rsub___cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__batch_norm_with_update_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad__chunk_cat_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_acos_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_add_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addcmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_addmm_decomposed_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_alias_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_all_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_allclose_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_any_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_argsort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_as_strided_partial_views_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_asin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atan_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_atleast_3d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_baddbmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_bmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_to_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_broadcast_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cartesian_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ceil_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_char_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_inverse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cholesky_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_clamp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_conj_physical_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_constant_pad_nd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_contiguous_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cos_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_cumprod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_deg2rad_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diag_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diagonal_scatter_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_diff_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_dot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_double_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_einsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_empty_permuted_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_eq_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_exp2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_expand_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_fft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_hfftn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_ifft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft2_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fft_rfft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flatten_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_flip_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_float_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_floor_divide_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_full_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_gradient_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_grid_sampler_3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_heaviside_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_hypot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_igamma_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_put_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_index_reduce_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_int_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_isin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_istft_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_cross_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_det_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eig_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_eigh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_inv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_factor_ex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_ldl_solve_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_lu_factor_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_power_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_hermitian_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_pinv_singular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_solve_triangular_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_svd_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_linalg_svdvals_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log1p_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_log2_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logaddexp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logcumsumexp_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logdet_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_logspace_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_lu_unpack_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mH_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amax_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_cumsum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_fill_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_normalize_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_masked_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matmul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_matrix_exp_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mode_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mul_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mv_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_mv_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nanmean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_dropout_backward_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_native_layer_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ne_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_neg_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_empty_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_new_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv1d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv2d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose2d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_conv_transpose3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_hardtanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_instance_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_interpolate_area_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_l1_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_logsigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_max_unpool3d_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_pad_reflect_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_rms_norm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_softsign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_tanhshrink_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_threshold_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_nn_functional_upsample_nearest_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_norm_inf_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_in_place_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_normal_number_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ones_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_outer_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_pca_lowrank_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_permute_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_polygamma_polygamma_n_1_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_prod_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_put_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_qr_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_randn_like_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_ravel_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_reshape_as_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resize_as__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_conj_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_resolve_neg_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_roll_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rot90_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_round_decimals_0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_rsub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_add_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_amin_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_scatter_reduce_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sgn_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sigmoid_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sign_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_bartlett_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_general_cosine_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signal_windows_kaiser_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_signbit_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sinc_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_slice_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_softmax_with_dtype_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sort_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sparse_sampled_addmm_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_airy_ai_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_modified_bessel_k0_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_list_args_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_split_with_sizes_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sqrt_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sqrt_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_stft_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sub_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_sum_to_size_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_svd_lowrank_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_t_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tanh_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_sparse_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_to_sparse_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_trapz_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_tril_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unbind_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unfold_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_uniform_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_chunk_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsafe_split_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_copy_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_unsqueeze_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_mean_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_var_unbiased_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_vdot_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_as_complex_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_complex128, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_view_cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zero__cuda_float64, test/test_ops_gradients.py::TestBwdGradientsCUDA::test_inplace_gradgrad_zeros_cuda_complex128
2025-12-04T13:54:03.2850080Z 
2025-12-04T13:54:03.2850310Z Finished test_ops_gradients 2/4 ... [2025-12-04 13:54:03.215429][18083.143720872], took 5.19min
2025-12-04T13:54:03.2851014Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-e4282282966a7ff9.xml
2025-12-04T13:54:03.3537684Z Running test_nestedtensor 3/4 ... [2025-12-04 13:54:03.353540][18083.281838322]
2025-12-04T13:54:03.3538125Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:54:03.3541091Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=3', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:54:03.353849]
2025-12-04T14:00:01.7448863Z 
2025-12-04T14:00:01.7449810Z test_nestedtensor 3/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_3.4_dff7d83ca5d5d6c7_.log
2025-12-04T14:00:01.7583942Z Running 398 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_is_contiguous, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_zeros_like, test/test_nestedtensor.py::TestNestedTensor::test_unbind_3, test/test_nestedtensor.py::TestNestedInt::test_comparisons, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_jagged_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_masked_select_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim2_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_cos_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isposinf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sgn_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sqrt_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_sub_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_indexing_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_edge_case_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_2_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_generates_leaf_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_set_requires_grad_from_mask_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_to_buffer_series_ops_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_autograd_function_with_None_grad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_with_nested_int_second_arg_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_propagated_dynamic_max_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_in_inference_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_construction_from_list_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_copy__cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_4_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_narrow_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_profiler_sequence_nr_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_record_stream_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_autocast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_recompile_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_squeeze_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_1_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_view_ragged_idx_not_one_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hash_tensor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isreal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_bag_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_laguerre_polynomial_l_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_legendre_polynomial_p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_input_mutation_backward_cuda
2025-12-04T14:00:01.7710442Z 
2025-12-04T14:00:01.7710664Z Finished test_nestedtensor 3/4 ... [2025-12-04 14:00:01.745452][18441.673744426], took 5.97min
2025-12-04T14:00:01.7791189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-59ab3583fe1f80dc.xml
2025-12-04T14:00:01.8626241Z Running functorch/test_control_flow 4/4 ... [2025-12-04 14:00:01.862370][18441.790667554]
2025-12-04T14:00:01.8627012Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:00:01.8629767Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_control_flow.py', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:00:01.862668]
2025-12-04T14:08:01.2653077Z 
2025-12-04T14:08:01.2653994Z functorch/test_control_flow 4/4 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_control_flow_4.4_38ed588b114098c9_.log
2025-12-04T14:08:01.2882400Z Running 478 items in this shard: test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_different_pytree_output, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_grad_through_cond, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_nested, test/functorch/test_control_flow.py::TestControlFlow::test_cond_autograd_pytree_not_all_inputs_used, test/functorch/test_control_flow.py::TestControlFlow::test_cond_in_forloop, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_no_grad_output, test/functorch/test_control_flow.py::TestControlFlow::test_map_autograd_simple_partial_grad, test/functorch/test_control_flow.py::TestControlFlow::test_map_list_in_out, test/functorch/test_control_flow.py::TestControlFlow::test_scan_binary_operator_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_carry_output_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_compile_mode_none_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_complex_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_eager_partial_grad_random_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_additional_inputs_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_False_compile_mode_none_partial_grad_random_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_init_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_eager_partial_grad_xs_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_RNN_partial_autograd_reverse_True_compile_mode_none_partial_grad_complex_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_carries_ys_same_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_all_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_additional_inputs_partial_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_for_out_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_equal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_combine_fn_with_no_grad_init_carries_unequal_grad_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_closure_nested_reverse_True_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_cnt_reverse_True_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_compile_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_complex_pytree_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dim_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_matmul_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_downstream_scan_scan_dim_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cpu_complex64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cpu_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_False_compile_mode_none_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cpu_float16, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_float32, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_eager_cuda_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_dtype_reverse_True_compile_mode_none_cpu_int64, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_False_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_pytree_complex_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_carry_shape, test/functorch/test_control_flow.py::TestControlFlow::test_scan_init_wrong_pytree_complex_reverse_False_cpu, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_carry_alias, test/functorch/test_control_flow.py::TestControlFlow::test_scan_input_mutation, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_1_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_multiple_layers_gradient_layers_3_device_cuda, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_False, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_False_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_non_pointwise_reverse_True_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_False_compile_mode_none_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_scan_tuple_reverse_True_compile_mode_none_cuda_autograd_True, test/functorch/test_control_flow.py::TestControlFlow::test_while_loop_gpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_binary_operator_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_False_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_eager_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_generic_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_compile_reverse_True_compile_mode_none_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_complex_pytree_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_cond_in_combine_fn_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_different_input_size_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_compile_dynamic_shape_combine_mode_pointwise_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_False_compile_mode_eager_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_combine_mode_pointwise_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_compile_dynamic_shape_combine_mode_generic_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_dim_reverse_True_compile_mode_eager_combine_mode_pointwise_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_matmul_combine_mode_pointwise_compile_mode_none_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_eager_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_generic_compile_mode_none_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_combine_mode_pointwise_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_compile_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_generic_compile_mode_none_reverse_first_True_same_direction_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_False_same_direction_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_first_True_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_compile_reverse_first_False_same_direction_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_eager_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_False_same_direction_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_downstream_scan_scan_different_dim_combine_mode_pointwise_compile_mode_none_reverse_first_True_same_direction_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_expand_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_compile_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_fct_generic_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_nested_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_False_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cpu_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_dynamic_shape_reverse_True_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_compile_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cpu_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_eager_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_False_cuda_combine_mode_pointwise_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cpu_combine_mode_generic_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_generic_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_pytree_compile_mode_none_reverse_True_cuda_combine_mode_pointwise_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_shape_check_compile_mode_none_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_eager_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_freevars_simple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_dynamic_shape_loop_type_for_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_compile_loop_type_for_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_loop_in_combine_fn_compile_mode_none_loop_type_for_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_dynamic_shape_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_compile_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_eager_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_contiguous_tensor_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_False_compile_mode_compile_dynamic_shape_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_compile_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_eager_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_non_pointwise_generic_reverse_True_compile_mode_none_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_eager_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_generic_compile_mode_none_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_combine_mode_pointwise_compile_mode_none_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_False_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_compile_dynamic_shape_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_generic_compile_mode_eager_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_dynamic_shape_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_compile_reverse_True_cuda, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_partial_grad_no_grad_combine_mode_pointwise_compile_mode_eager_reverse_False_cpu, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_generic_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_combine_mode_pointwise_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_compile_dynamic_shape_combine_mode_pointwise_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_generic_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_eager_combine_mode_pointwise_reverse_True_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_False_cpu_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_generic_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_tuple_compile_mode_none_combine_mode_pointwise_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_False_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_dynamic_shape_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_compile_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_False_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_eager_reverse_True_cuda_autograd_True, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cpu_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_vmap_in_combine_fn_compile_mode_none_reverse_True_cuda_autograd_False, test/functorch/test_control_flow.py::AssociativeScanTests::test_associative_scan_wrong_pytree, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_compile_while_loop_stack_output_dynamic_False_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_autograd_backward, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_eager_run_with_item, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_false_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_input_mutation_on_true_branch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_functionalized_output_alias_input, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_nested_traced_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_traced_not_nested_fake_tensor, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_boolTensor_innerFnType_object_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_function_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_module_nOperands_1_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_bool_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_module_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_floatTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_function_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_0_nClosure_1_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_module_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_0_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_tracing_with_valid_inputs_predType_intTensor_innerFnType_object_nOperands_1_nClosure_0_nesting_2, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_multiple_outputs_nClosure_1, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_function_nOperands_1_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_module_nOperands_2_nClosure_0_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_1_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_predType_boolTensor_innerFnType_object_nOperands_2_nClosure_1_nesting_0, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_vmap_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_module_param_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_cond_with_tensor_closure, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_aot_func, test/functorch/test_control_flow.py::TestControlFlowTraced::test_map_functionalized_elem_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_merge_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_mixed_batch_dims, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_in_vmap_unbatched_x, test/functorch/test_control_flow.py::TestControlFlowTraced::test_scan_vmap_scan_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_real, test/functorch/test_control_flow.py::TestControlFlowTraced::test_tracing_map_symbolic_list, test/functorch/test_control_flow.py::TestControlFlowTraced::test_vmap_vmap_boolcond_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_const_and_symint_output, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_aot_eager_while_loop_test_pytree_int_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_compile_backend_eager_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_cpp_while_loop_test_simple_with_mutation, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_functorch_while_loop_test_simple, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_functionalize_func_type_python_while_loop_test_simple_with_pytree_carry, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_nested_traced, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_constant_and_symint_output_export_strict_True_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_int_carry_export_strict_False_dynamic_False, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_aot_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_compile_dynamic_True_backend_eager, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_op_pytree_int_carry_export_strict_False_dynamic_True, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_cpp, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_simple_functionalize_check_graph_func_type_functorch, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested, test/functorch/test_control_flow.py::TestControlFlowTraced::test_while_loop_tracing_while_loop_test_nested_with_linear, test/functorch/test_control_flow.py::TestHopSchema::test_associative_scan_gen_schema_with_additional_inputs, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_ScriptObj, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_list_gen_schema_type_int, test/functorch/test_control_flow.py::TestHopSchema::test_schema_tree_spec, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_GraphModule, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymBool, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_SymInt, test/functorch/test_control_flow.py::TestHopSchema::test_type_gen_schema_type_float, test/functorch/test_control_flow.py::TestHopSchema::test_while_loop_gen_schema_with_additional_inputs
2025-12-04T14:08:01.3090133Z 
2025-12-04T14:08:01.3090396Z Finished functorch/test_control_flow 4/4 ... [2025-12-04 14:08:01.273013][18921.201304942], took 7.99min
2025-12-04T14:08:01.3091185Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-5ef9f5430d478fb5.xml
2025-12-04T14:08:02.9517712Z Uploading artifacts took 1.50 seconds
2025-12-04T14:08:02.9520068Z Running complex_tensor/test_complex_tensor 3/3 ... [2025-12-04 14:08:02.951762][18922.880058764]
2025-12-04T14:08:02.9520621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:08:02.9523736Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'complex_tensor/test_complex_tensor.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:08:02.952120]
2025-12-04T14:11:18.7726133Z 
2025-12-04T14:11:18.7727980Z complex_tensor/test_complex_tensor 3/3 was successful, full logs can be found in artifacts with path test/test-reports/complex_tensor.test_complex_tensor_3.3_18c49b70545b8444_.log
2025-12-04T14:11:18.7791323Z Running 193 items in this shard: test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_abs_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_abs_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_abs_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_acos_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_acosh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_acosh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_add_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_addmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_addmm_decomposed_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_all_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_any_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_asinh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atan_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atan_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atanh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_atanh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_bmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_clone_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_conj_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_conj_physical_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cos_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cosh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cosh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_cumprod_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_diagonal_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_div_no_rounding_mode_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_div_no_rounding_mode_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_eq_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_eq_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_eq_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_expm1_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_flatten_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_flip_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_full_like_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_index_add_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_index_select_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_index_select_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_isfinite_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_isinf_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_isnan_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_log_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_log_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_logical_and_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_logical_xor_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_mm_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_mul_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_ne_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_ne_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_new_zeros_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_permute_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_pow_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_prod_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_randn_like_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_real_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_repeat_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_rsqrt_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_rsqrt_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_rsub_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_scatter_add_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_select_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sgn_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sin_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sin_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_slice_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_list_args_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_split_with_sizes_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sqrt_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_squeeze_multiple_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_stack_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_stack_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sub_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sub_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sum_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sum_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_sum_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_tanh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_true_divide_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_var_unbiased_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexTensorCUDA::test_consistency_where_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_abs_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_acos_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_acosh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_asin_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_asinh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_atan_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_atanh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_conj_physical_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_cos_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_cumsum_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_dot_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_gather_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_imag_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_log1p_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_mm_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_mul_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_permute_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_repeat_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_scatter_add_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_split_list_args_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_split_with_sizes_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_squeeze_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_stack_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_sub_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_sum_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_unsqueeze_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_var_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_var_unbiased_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexBwdGradientsCUDA::test_fn_grad_zero__cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_abs_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_acosh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_addmm_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_addmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_addmm_decomposed_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_all_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_angle_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asin_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asin_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asin_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_asinh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atan_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atan_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atanh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_atanh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_bmm_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cat_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_clone_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_conj_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_conj_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_constant_pad_nd_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cos_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cosh_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cosh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_cumprod_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_diagonal_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_diagonal_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_eq_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_exp_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_expand_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_expm1_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_expm1_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_flip_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_full_like_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_gather_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_imag_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_index_select_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_isclose_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_isinf_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_isnan_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log1p_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log1p_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_log_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_logical_not_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_logical_xor_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_masked_fill_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_masked_scatter_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_masked_scatter_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_mean_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_mul_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_ne_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_ne_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_neg_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_neg_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_new_zeros_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_permute_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_pow_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_prod_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_prod_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_reciprocal_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_repeat_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_rsqrt_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sin_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sinh_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_slice_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_split_list_args_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sqrt_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sqrt_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_squeeze_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_stack_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_stack_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_sub_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_t_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_tanh_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_transpose_cuda_complex32, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_var_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_var_unbiased_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_view_as_real_cuda_complex128, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_view_as_real_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_view_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_where_cuda_complex64, test/complex_tensor/test_complex_tensor.py::TestComplexDistributedCUDA::test_distributed_zeros_like_cuda_complex32
2025-12-04T14:11:18.7852338Z 
2025-12-04T14:11:18.7852713Z Finished complex_tensor/test_complex_tensor 3/3 ... [2025-12-04 14:11:18.772805][19118.701102271], took 3.26min
2025-12-04T14:11:18.8063012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/complex_tensor.test_complex_tensor/complex_tensor.test_complex_tensor-b8215f419723e2db.xml
2025-12-04T14:11:18.8913427Z Running optim/test_optim 1/1 ... [2025-12-04 14:11:18.891097][19118.819396713]
2025-12-04T14:11:18.8913913Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:11:18.8916863Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:11:18.891408]
2025-12-04T14:11:21.5614781Z 
2025-12-04T14:11:21.5615911Z optim/test_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_optim_1.1_96914e6399c32275_.log
2025-12-04T14:11:21.5616625Z 
2025-12-04T14:11:21.5616921Z Finished optim/test_optim 1/1 ... [2025-12-04 14:11:21.561232][19121.489528287], took 0.04min
2025-12-04T14:11:21.5946193Z Running torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 14:11:21.594381][19121.522680798]
2025-12-04T14:11:21.5946955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:11:21.5949588Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/fft/test_pocketfft.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:11:21.594692]
2025-12-04T14:11:27.0181565Z 
2025-12-04T14:11:27.0182688Z torch_np/numpy_tests/fft/test_pocketfft 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_de16653e9fff9a91_.log
2025-12-04T14:11:27.0205071Z Running 79 items in this shard: test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTShift::test_fft_n, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_all_1d_norm_preserving, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_hfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_identity, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_backward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_forward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_ortho, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ihfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_ifft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_rfft
2025-12-04T14:11:27.0225646Z 
2025-12-04T14:11:27.0226030Z Finished torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 14:11:27.018047][19126.946343927], took 0.09min
2025-12-04T14:11:27.0513421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bedc5291ea06d7d4.xml
2025-12-04T14:11:27.1440662Z Running functorch/test_ops 1/9 ... [2025-12-04 14:11:27.143788][19127.072087697]
2025-12-04T14:11:27.1441536Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:11:27.1443475Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=1', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:11:27.144087]
2025-12-04T14:17:38.0423424Z 
2025-12-04T14:17:38.0424368Z functorch/test_ops 1/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_1.9_c0eee613fccab590_.log
2025-12-04T14:17:38.0726305Z Running 1130 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_nll_loss_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_softmax_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_topk_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_T_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_conj_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_flatten_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SelectGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___radd___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rdiv___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_lengths_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gradient_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hash_tensor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hypot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_householder_product_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanquantile_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_layer_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_area_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bicubic_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_grad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_replicate_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rad2deg_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zero__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32
2025-12-04T14:17:38.1018801Z 
2025-12-04T14:17:38.1019032Z Finished functorch/test_ops 1/9 ... [2025-12-04 14:17:38.043531][19497.971824397], took 6.18min
2025-12-04T14:17:38.1019769Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-7c0d16c38d6c5d66.xml
2025-12-04T14:17:38.1967942Z Running functorch/test_ops 6/9 ... [2025-12-04 14:17:38.196551][19498.124848884]
2025-12-04T14:17:38.1968363Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:17:38.1970661Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=6', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:17:38.196836]
2025-12-04T14:26:24.8016629Z 
2025-12-04T14:26:24.8020071Z functorch/test_ops 6/9 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_6.9_28c20a9b783bf56f_.log
2025-12-04T14:26:24.8310586Z Running 1090 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ceil_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_le_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_sort_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_flatten_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mT_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_reshape_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyExpMarkDirtyAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpySortAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_block_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ceil_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_physical_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cosh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumsum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_deg2rad_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lgamma_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_triangular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_normalize_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_minimum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_alpha_dropout_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_gelu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_layer_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_outer_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_blackman_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_y1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_u_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_multiple_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_consecutive_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32
2025-12-04T14:26:24.8592178Z 
2025-12-04T14:26:24.8592422Z Finished functorch/test_ops 6/9 ... [2025-12-04 14:26:24.803228][20024.731520754], took 8.78min
2025-12-04T14:26:24.8593151Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-53045e36c53c5e0b.xml
2025-12-04T14:26:24.9525872Z Running torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 14:26:24.952347][20024.880644606]
2025-12-04T14:26:24.9526391Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:26:24.9529267Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_getlimits.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:24.952659]
2025-12-04T14:26:28.3233363Z 
2025-12-04T14:26:28.3234565Z torch_np/numpy_tests/core/test_getlimits 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_82959dff790c271d_.log
2025-12-04T14:26:28.3240305Z Running 17 items in this shard: test/torch_np/numpy_tests/core/test_getlimits.py::TestPythonFloat::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestHalf::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestSingle::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestDouble::test_singleton, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestFinfo::test_basic_missing, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_basic, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T0, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T1, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T2, test/torch_np/numpy_tests/core/test_getlimits.py::TestIinfo::test_unsigned_max_T3, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_finfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestRepr::test_iinfo_repr, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_instances, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_known_types, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_plausible_finfo, test/torch_np/numpy_tests/core/test_getlimits.py::TestMisc::test_subnormal_warning
2025-12-04T14:26:28.3244758Z 
2025-12-04T14:26:28.3245024Z Finished torch_np/numpy_tests/core/test_getlimits 1/1 ... [2025-12-04 14:26:28.322993][20028.251284904], took 0.06min
2025-12-04T14:26:28.3576945Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-3f363f609079497e.xml
2025-12-04T14:26:28.4328578Z Running torch_np/test_ndarray_methods 1/1 ... [2025-12-04 14:26:28.432554][20028.360849183]
2025-12-04T14:26:28.4329025Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:26:28.4332480Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ndarray_methods.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:28.432994]
2025-12-04T14:26:34.7580829Z 
2025-12-04T14:26:34.7582392Z torch_np/test_ndarray_methods 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ndarray_methods_1.1_6320f1f29b59eecd_.log
2025-12-04T14:26:34.7674672Z Running 342 items in this shard: test/torch_np/test_ndarray_methods.py::TestIndexing::test_indexing_simple, test/torch_np/test_ndarray_methods.py::TestIndexing::test_setitem, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_function, test/torch_np/test_ndarray_methods.py::TestReshape::test_reshape_method, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_function, test/torch_np/test_ndarray_methods.py::TestTranspose::test_transpose_method, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_function, test/torch_np/test_ndarray_methods.py::TestRavel::test_ravel_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_array_method, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_onedim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_trivial, test/torch_np/test_ndarray_methods.py::TestNonzero::test_nonzero_twodim, test/torch_np/test_ndarray_methods.py::TestNonzero::test_sparse, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_max, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_all_method_min, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmax_np_method0, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_np_vs_ndarray_positional_arr_method_argmin_np_method1, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_output_shape_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmax, test/torch_np/test_ndarray_methods.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmin, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmax::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmax::test_maximum_signed_integers, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data0, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data1, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data10, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data11, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data12, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data13, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data14, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data15, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data16, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data17, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data18, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data19, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data2, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data20, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data21, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data22, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data23, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data24, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data25, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data26, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data27, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data28, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data29, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data3, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data30, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data31, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data32, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data33, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data34, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data35, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data36, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data37, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data38, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data39, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data4, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data40, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data41, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data42, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data43, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data44, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data45, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data46, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data47, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data48, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data49, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data5, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data50, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data51, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data52, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data53, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data54, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data55, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data56, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data57, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data58, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data59, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data6, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data60, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data61, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data62, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data63, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data64, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data65, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data66, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data67, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data68, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data69, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data7, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data70, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data71, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data72, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data73, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data8, test/torch_np/test_ndarray_methods.py::TestArgmin::test_combinations_data9, test/torch_np/test_ndarray_methods.py::TestArgmin::test_minimum_signed_integers, test/torch_np/test_ndarray_methods.py::TestAmax::test_basic, test/torch_np/test_ndarray_methods.py::TestAmin::test_basic, test/torch_np/test_ndarray_methods.py::TestContains::test_contains, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_fn, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_ivar, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_method, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_name, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_plain, test/torch_np/test_ndarray_methods.py::TestNoExtraMethods::test_extra_methods_name_rvar, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_1d, test/torch_np/test_ndarray_methods.py::TestIter::test_iter_2d
2025-12-04T14:26:34.7763773Z 
2025-12-04T14:26:34.7764025Z Finished torch_np/test_ndarray_methods 1/1 ... [2025-12-04 14:26:34.758691][20034.686985782], took 0.11min
2025-12-04T14:26:34.7930900Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-84ccb0941d397f92.xml
2025-12-04T14:26:34.8791631Z Running test_view_ops 1/1 ... [2025-12-04 14:26:34.878934][20034.807231487]
2025-12-04T14:26:34.8792040Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:26:34.8794965Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_view_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:34.879242]
2025-12-04T14:26:49.7676355Z 
2025-12-04T14:26:49.7677677Z test_view_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_view_ops_1.1_cb0863e2f971ce65_.log
2025-12-04T14:26:49.7734309Z Running 279 items in this shard: test/test_view_ops.py::TestViewOpsCUDA::test_T_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_assignment_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_gradients_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_ellipses_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_newaxis_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_slice_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_chunk_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_view_with_shared_memory_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_self_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_diagonal_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_movedim_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_narrow_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_permute_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_select_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_split_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unfold_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_complex_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex32, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_out_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_output_contiguous_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_T_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_as_strided_overflow_storage_offset_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_gradient_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_big_transpose_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_shapes_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_tensors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_chunk_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_conj_neg_view_numpy_error_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_contiguous_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_crow_col_indices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_empty_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_expand_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_flatten_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize__cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize_as_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_tensor_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_python_types_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_ravel_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_preserves_strides_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_overflow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_split_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_t_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_errors_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_unsqueeze_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_empty_cuda
2025-12-04T14:26:49.7788739Z 
2025-12-04T14:26:49.7788928Z Finished test_view_ops 1/1 ... [2025-12-04 14:26:49.767851][20049.696144943], took 0.25min
2025-12-04T14:26:49.8019191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_view_ops/test_view_ops-7140a1ac93a67fd6.xml
2025-12-04T14:26:49.8805976Z Running test_nn 1/1 ... [2025-12-04 14:26:49.880357][20049.808654594]
2025-12-04T14:26:49.8806360Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:26:49.8809067Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nn.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:49.880638]
2025-12-04T14:32:36.6809209Z 
2025-12-04T14:32:36.6809961Z test_nn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_nn_1.1_a9c5853bd400a590_.log
2025-12-04T14:32:36.7652080Z Running 2436 items in this shard: test/test_nn.py::TestNN::test_AdaptiveLogSoftmax, test/test_nn.py::TestNN::test_AdaptiveLogSoftmax_cuda_fp32, test/test_nn.py::TestNN::test_AdaptiveLogSoftmax_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_BCELoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_BCELoss_no_reduce, test/test_nn.py::TestNN::test_BCELoss_no_reduce_cuda, test/test_nn.py::TestNN::test_BCELoss_no_reduce_scalar, test/test_nn.py::TestNN::test_BCELoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_scalar, test/test_nn.py::TestNN::test_BCELoss_weights_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_legacy_enum, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_legacy_enum_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_scalar, test/test_nn.py::TestNN::test_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_CELU_no_batch_dim, test/test_nn.py::TestNN::test_CELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_CTCLoss_critical_target_len, test/test_nn.py::TestNN::test_CTCLoss_lengthchecks_cpu, test/test_nn.py::TestNN::test_CTCLoss_lengthchecks_cuda, test/test_nn.py::TestNN::test_CTCLoss_long_targets, test/test_nn.py::TestNN::test_CTCLoss_typechecks, test/test_nn.py::TestNN::test_CTCLoss_zero_infinity, test/test_nn.py::TestNN::test_CTCLoss_zero_lengths, test/test_nn.py::TestNN::test_Conv1d, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_circular_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_dilated, test/test_nn.py::TestNN::test_Conv1d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_groups, test/test_nn.py::TestNN::test_Conv1d_groups_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_groups_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad1, test/test_nn.py::TestNN::test_Conv1d_pad1_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad1_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad1size1, test/test_nn.py::TestNN::test_Conv1d_pad1size1_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad1size1_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad2, test/test_nn.py::TestNN::test_Conv1d_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad2size1, test/test_nn.py::TestNN::test_Conv1d_pad2size1_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad2size1_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_same, test/test_nn.py::TestNN::test_Conv1d_pad_same2, test/test_nn.py::TestNN::test_Conv1d_pad_same2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_same2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_same_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_same_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_same_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_pad_valid, test/test_nn.py::TestNN::test_Conv1d_pad_valid_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_pad_valid_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_reflect_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_replicate_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_stride, test/test_nn.py::TestNN::test_Conv1d_stride_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_stride_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_zero_batch, test/test_nn.py::TestNN::test_Conv1d_zero_batch_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_zero_batch_cuda_tf32, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv1d_zeros_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_circular_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise, test/test_nn.py::TestNN::test_Conv2d_depthwise_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_padded_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_strided_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_depthwise_with_multiplier_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_dilated, test/test_nn.py::TestNN::test_Conv2d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_dilated_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups, test/test_nn.py::TestNN::test_Conv2d_groups_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_thnn_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_groups_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_no_bias, test/test_nn.py::TestNN::test_Conv2d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_pad_same, test/test_nn.py::TestNN::test_Conv2d_pad_same_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_pad_same_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_pad_same_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_pad_valid, test/test_nn.py::TestNN::test_Conv2d_pad_valid_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_pad_valid_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_padding, test/test_nn.py::TestNN::test_Conv2d_padding_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_padding_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_padding_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_reflect_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_replicate_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_strided, test/test_nn.py::TestNN::test_Conv2d_strided_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_strided_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_strided_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_zero_batch, test/test_nn.py::TestNN::test_Conv2d_zero_batch_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_zero_batch_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_zero_batch_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv2d_zeros_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_1x1x1_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_circular_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_dilated, test/test_nn.py::TestNN::test_Conv3d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_dilated_strided, test/test_nn.py::TestNN::test_Conv3d_dilated_strided_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_dilated_strided_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_groups, test/test_nn.py::TestNN::test_Conv3d_groups_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_groups_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_groups_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_no_bias, test/test_nn.py::TestNN::test_Conv3d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_pad_same, test/test_nn.py::TestNN::test_Conv3d_pad_same_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_pad_same_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_pad_same_dilated_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_pad_valid, test/test_nn.py::TestNN::test_Conv3d_pad_valid_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_pad_valid_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_replicate_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride, test/test_nn.py::TestNN::test_Conv3d_stride_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride_padding, test/test_nn.py::TestNN::test_Conv3d_stride_padding_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_padding_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_padding_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_stride_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_zero_batch, test/test_nn.py::TestNN::test_Conv3d_zero_batch_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_zero_batch_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_zero_batch_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2_cuda_fp32, test/test_nn.py::TestNN::test_Conv3d_zeros_stride2_pad2_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d, test/test_nn.py::TestNN::test_ConvTranspose1d_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d_groups, test/test_nn.py::TestNN::test_ConvTranspose1d_groups_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_groups_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose1d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d, test/test_nn.py::TestNN::test_ConvTranspose2d_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_dilated_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_groups_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_no_bias_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose2d_with_long_tensor_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose3d, test/test_nn.py::TestNN::test_ConvTranspose3d_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose3d_cuda_tf32, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated_cuda_fp32, test/test_nn.py::TestNN::test_ConvTranspose3d_dilated_cuda_tf32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_CosineEmbeddingLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_CrossMapLRN2d, test/test_nn.py::TestNN::test_CrossMapLRN2d_cuda, test/test_nn.py::TestNN::test_ELU_no_batch_dim, test/test_nn.py::TestNN::test_ELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Embedding, test/test_nn.py::TestNN::test_EmbeddingBag_discontiguous, test/test_nn.py::TestNN::test_EmbeddingBag_discontiguous_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_max, test/test_nn.py::TestNN::test_EmbeddingBag_max_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_max_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_max_padding_idx_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_mean, test/test_nn.py::TestNN::test_EmbeddingBag_mean_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_mean_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_mean_padding_idx_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sparse, test/test_nn.py::TestNN::test_EmbeddingBag_sparse_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sum, test/test_nn.py::TestNN::test_EmbeddingBag_sum_cuda, test/test_nn.py::TestNN::test_EmbeddingBag_sum_padding_idx, test/test_nn.py::TestNN::test_EmbeddingBag_sum_padding_idx_cuda, test/test_nn.py::TestNN::test_Embedding_cuda, test/test_nn.py::TestNN::test_Embedding_discontiguous, test/test_nn.py::TestNN::test_Embedding_discontiguous_cuda, test/test_nn.py::TestNN::test_Embedding_sparse, test/test_nn.py::TestNN::test_Embedding_sparse_cuda, test/test_nn.py::TestNN::test_Flatten, test/test_nn.py::TestNN::test_Flatten_cuda, test/test_nn.py::TestNN::test_Flatten_no_batch_dim, test/test_nn.py::TestNN::test_Flatten_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Fold, test/test_nn.py::TestNN::test_Fold_cuda, test/test_nn.py::TestNN::test_Fold_int_input, test/test_nn.py::TestNN::test_Fold_int_input_cuda, test/test_nn.py::TestNN::test_Fold_no_batch_dim_input, test/test_nn.py::TestNN::test_Fold_no_batch_dim_input_cuda, test/test_nn.py::TestNN::test_Fold_no_batch_dim_int_input, test/test_nn.py::TestNN::test_Fold_no_batch_dim_int_input_cuda, test/test_nn.py::TestNN::test_GELU_no_batch_dim, test/test_nn.py::TestNN::test_GELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_GLU_no_batch_dim, test/test_nn.py::TestNN::test_GLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardshrink_no_batch_dim, test/test_nn.py::TestNN::test_Hardshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardsigmoid_no_batch_dim, test/test_nn.py::TestNN::test_Hardsigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardswish_no_batch_dim, test/test_nn.py::TestNN::test_Hardswish_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Hardtanh_no_batch_dim, test/test_nn.py::TestNN::test_Hardtanh_no_batch_dim_cuda, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_margin_no_reduce, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_reduce, test/test_nn.py::TestNN::test_HingeEmbeddingLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_HuberLoss_delta, test/test_nn.py::TestNN::test_HuberLoss_delta_cuda, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_HuberLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_batch_mean, test/test_nn.py::TestNN::test_KLDivLoss_batch_mean_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_KLDivLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_log_target_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_log_target, test/test_nn.py::TestNN::test_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_nn.py::TestNN::test_KLDivLoss_with_log_target_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_with_log_target_no_reduce_cuda, test/test_nn.py::TestNN::test_KLDivLoss_with_target_no_reduce, test/test_nn.py::TestNN::test_KLDivLoss_with_target_no_reduce_cuda, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_L1Loss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_L1Loss_no_reduce, test/test_nn.py::TestNN::test_L1Loss_no_reduce_complex, test/test_nn.py::TestNN::test_L1Loss_no_reduce_complex_cuda, test/test_nn.py::TestNN::test_L1Loss_no_reduce_cuda, test/test_nn.py::TestNN::test_L1Loss_no_reduce_scalar, test/test_nn.py::TestNN::test_L1Loss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_LSTM_cell, test/test_nn.py::TestNN::test_LSTM_cell_forward_hidden_size, test/test_nn.py::TestNN::test_LSTM_cell_forward_input_size, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_cuda, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_eval, test/test_nn.py::TestNN::test_LayerNorm_3d_no_affine_large_feature_eval_cuda, test/test_nn.py::TestNN::test_LeakyReLU_no_batch_dim, test/test_nn.py::TestNN::test_LeakyReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Linear, test/test_nn.py::TestNN::test_Linear_cuda_fp32, test/test_nn.py::TestNN::test_Linear_cuda_tf32, test/test_nn.py::TestNN::test_Linear_no_batch_dim, test/test_nn.py::TestNN::test_Linear_no_batch_dim_cuda_fp32, test/test_nn.py::TestNN::test_Linear_no_batch_dim_cuda_tf32, test/test_nn.py::TestNN::test_Linear_no_bias, test/test_nn.py::TestNN::test_Linear_no_bias_cuda_fp32, test/test_nn.py::TestNN::test_Linear_no_bias_cuda_tf32, test/test_nn.py::TestNN::test_LogSigmoid_no_batch_dim, test/test_nn.py::TestNN::test_LogSigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MSELoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MSELoss_no_reduce, test/test_nn.py::TestNN::test_MSELoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MSELoss_no_reduce_scalar, test/test_nn.py::TestNN::test_MSELoss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MarginRankingLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MaxUnpool1d_net, test/test_nn.py::TestNN::test_MaxUnpool1d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool1d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool1d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MaxUnpool2d_net, test/test_nn.py::TestNN::test_MaxUnpool2d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool2d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool2d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_MaxUnpool3d_net, test/test_nn.py::TestNN::test_MaxUnpool3d_net_cuda, test/test_nn.py::TestNN::test_MaxUnpool3d_net_no_batch_dim, test/test_nn.py::TestNN::test_MaxUnpool3d_net_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Mish_no_batch_dim, test/test_nn.py::TestNN::test_Mish_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ModuleDict, test/test_nn.py::TestNN::test_ModuleList, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_0d_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_1d_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_index_neg, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_index_neg_cuda, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiLabelMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_nn.py::TestNN::test_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_1d_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_1d_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_margin_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_margin_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_p_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_p_no_reduce_cuda, test/test_nn.py::TestNN::test_MultiMarginLoss_weights_no_reduce, test/test_nn.py::TestNN::test_MultiMarginLoss_weights_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLoss2d_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLossNd_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_NLLLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_NLLLoss_no_reduce, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_ignore_index, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_nn.py::TestNN::test_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_nn.py::TestNN::test_PReLU_backward_requires_grad_false, test/test_nn.py::TestNN::test_PReLU_no_batch_dim, test/test_nn.py::TestNN::test_PReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_PairwiseDistance, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_lhs, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_lhs_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_rhs, test/test_nn.py::TestNN::test_PairwiseDistance_broadcast_rhs_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_no_batch_dim, test/test_nn.py::TestNN::test_PairwiseDistance_no_batch_dim_cuda, test/test_nn.py::TestNN::test_PairwiseDistance_with_non_default_args, test/test_nn.py::TestNN::test_PairwiseDistance_with_non_default_args_cuda, test/test_nn.py::TestNN::test_ParameterDict, test/test_nn.py::TestNN::test_ParameterDict_replication, test/test_nn.py::TestNN::test_ParameterList, test/test_nn.py::TestNN::test_ParameterList_meta, test/test_nn.py::TestNN::test_ParameterList_replication, test/test_nn.py::TestNN::test_PixelShuffle, test/test_nn.py::TestNN::test_PixelShuffle_cuda, test/test_nn.py::TestNN::test_PixelUnshuffle, test/test_nn.py::TestNN::test_PixelUnshuffle_cuda, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_reduce, test/test_nn.py::TestNN::test_PoissonNLLLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_RNN_cell, test/test_nn.py::TestNN::test_RNN_cell_forward_zero_hidden_size, test/test_nn.py::TestNN::test_RNN_cell_no_broadcasting, test/test_nn.py::TestNN::test_RNN_change_dropout, test/test_nn.py::TestNN::test_RNN_cpu_vs_cudnn_no_dropout, test/test_nn.py::TestNN::test_RNN_cpu_vs_cudnn_with_dropout, test/test_nn.py::TestNN::test_RNN_cudnn_weight_norm, test/test_nn.py::TestNN::test_RNN_dropout, test/test_nn.py::TestNN::test_RNN_dropout_state, test/test_nn.py::TestNN::test_RNN_input_size_zero, test/test_nn.py::TestNN::test_RNN_nonlinearity, test/test_nn.py::TestNN::test_RNN_nonlinearity_passed_as_arg, test/test_nn.py::TestNN::test_RReLU, test/test_nn.py::TestNN::test_RReLU_cuda, test/test_nn.py::TestNN::test_RReLU_no_batch_dim, test/test_nn.py::TestNN::test_RReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_RReLU_with_up_down, test/test_nn.py::TestNN::test_RReLU_with_up_down_cuda, test/test_nn.py::TestNN::test_RReLU_with_up_down_scalar, test/test_nn.py::TestNN::test_RReLU_with_up_down_scalar_cuda, test/test_nn.py::TestNN::test_ReLU6_no_batch_dim, test/test_nn.py::TestNN::test_ReLU6_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ReLU_no_batch_dim, test/test_nn.py::TestNN::test_ReLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d, test/test_nn.py::TestNN::test_ReplicationPad3d_complex, test/test_nn.py::TestNN::test_ReplicationPad3d_complex_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d_cuda, test/test_nn.py::TestNN::test_ReplicationPad3d_no_batch_dim, test/test_nn.py::TestNN::test_ReplicationPad3d_no_batch_dim_cuda, test/test_nn.py::TestNN::test_SELU_no_batch_dim, test/test_nn.py::TestNN::test_SELU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Sequential_add, test/test_nn.py::TestNN::test_Sequential_append, test/test_nn.py::TestNN::test_Sequential_delitem, test/test_nn.py::TestNN::test_Sequential_extend, test/test_nn.py::TestNN::test_Sequential_getitem, test/test_nn.py::TestNN::test_Sequential_iadd, test/test_nn.py::TestNN::test_Sequential_imul, test/test_nn.py::TestNN::test_Sequential_insert, test/test_nn.py::TestNN::test_Sequential_insert_fail_case, test/test_nn.py::TestNN::test_Sequential_mul, test/test_nn.py::TestNN::test_Sequential_pop, test/test_nn.py::TestNN::test_Sequential_rmul, test/test_nn.py::TestNN::test_Sequential_setitem, test/test_nn.py::TestNN::test_Sequential_setitem_named, test/test_nn.py::TestNN::test_SiLU_no_batch_dim, test/test_nn.py::TestNN::test_SiLU_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Sigmoid_no_batch_dim, test/test_nn.py::TestNN::test_Sigmoid_no_batch_dim_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_beta, test/test_nn.py::TestNN::test_SmoothL1Loss_beta_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_SmoothL1Loss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_scalar, test/test_nn.py::TestNN::test_SmoothL1Loss_no_reduce_scalar_cuda, test/test_nn.py::TestNN::test_SmoothL1Loss_zero_beta, test/test_nn.py::TestNN::test_SmoothL1Loss_zero_beta_cuda, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_SoftMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_SoftMarginLoss_no_reduce, test/test_nn.py::TestNN::test_SoftMarginLoss_no_reduce_cuda, test/test_nn.py::TestNN::test_Softplus_no_batch_dim, test/test_nn.py::TestNN::test_Softplus_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Softshrink_no_batch_dim, test/test_nn.py::TestNN::test_Softshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Softsign_no_batch_dim, test/test_nn.py::TestNN::test_Softsign_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Tanh_no_batch_dim, test/test_nn.py::TestNN::test_Tanh_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Tanhshrink_no_batch_dim, test/test_nn.py::TestNN::test_Tanhshrink_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Threshold_no_batch_dim, test/test_nn.py::TestNN::test_Threshold_no_batch_dim_cuda, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerDecoderLayer_gelu_activation_cuda_tf32, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerDecoderLayer_relu_activation_cuda_tf32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_gelu_activation_cuda_tf32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation_cuda_fp32, test/test_nn.py::TestNN::test_TransformerEncoderLayer_relu_activation_cuda_tf32, test/test_nn.py::TestNN::test_Transformer_cell, test/test_nn.py::TestNN::test_Transformer_multilayer_coder, test/test_nn.py::TestNN::test_Transformer_multilayer_coder_cuda_fp32, test/test_nn.py::TestNN::test_Transformer_multilayer_coder_cuda_tf32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_fp32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_mean_cuda_tf32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_fp32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_none_cuda_tf32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_double, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_fp32, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_half, test/test_nn.py::TestNN::test_TripletMarginLoss_no_batch_dim_sum_cuda_tf32, test/test_nn.py::TestNN::test_Unflatten_no_batch_dim, test/test_nn.py::TestNN::test_Unflatten_no_batch_dim_cuda, test/test_nn.py::TestNN::test_Unfold, test/test_nn.py::TestNN::test_Unfold_cuda, test/test_nn.py::TestNN::test_Unfold_int_input, test/test_nn.py::TestNN::test_Unfold_int_input_cuda, test/test_nn.py::TestNN::test_adaptive_log_softmax, test/test_nn.py::TestNN::test_add_module, test/test_nn.py::TestNN::test_add_module_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_affine_grid, test/test_nn.py::TestNN::test_affine_grid_3d, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cpu_nd_2, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cpu_nd_3, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cuda_nd_2, test/test_nn.py::TestNN::test_affine_grid_backward_cl_cf_consistency_device_cuda_nd_3, test/test_nn.py::TestNN::test_affine_grid_error_checking, test/test_nn.py::TestNN::test_assignment, test/test_nn.py::TestNN::test_batch_norm_update_stats, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_inference_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_2D_train_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_inference_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NCHW_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_NCHW_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_NCHW_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_NCHW_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_cpu_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_cpu_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_cpu_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_native_float32, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_native_mixed_bfloat16, test/test_nn.py::TestNN::test_batchnorm_3D_train_NHWC_vs_native_mixed_float16, test/test_nn.py::TestNN::test_batchnorm_buffer_update_when_stats_are_not_tracked, test/test_nn.py::TestNN::test_batchnorm_cudnn_half, test/test_nn.py::TestNN::test_batchnorm_cudnn_nhwc, test/test_nn.py::TestNN::test_batchnorm_half_overflow, test/test_nn.py::TestNN::test_batchnorm_load_state_dict, test/test_nn.py::TestNN::test_batchnorm_nhwc_cpu, test/test_nn.py::TestNN::test_batchnorm_nhwc_cuda, test/test_nn.py::TestNN::test_batchnorm_non_contig_cpu_BatchNorm2d, test/test_nn.py::TestNN::test_batchnorm_non_contig_cpu_SyncBatchNorm, test/test_nn.py::TestNN::test_batchnorm_nonaffine_cuda_half_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_bias_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_less_than_one_value_per_channel, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_mean_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_var_is_not_same_size_as_input, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_running_var_or_running_mean_have_forward_grad, test/test_nn.py::TestNN::test_batchnorm_raises_error_if_weight_is_not_same_size_as_input, test/test_nn.py::TestNN::test_bce_loss_always_nonnegative, test/test_nn.py::TestNN::test_bce_loss_broadcasts_weights, test/test_nn.py::TestNN::test_bce_loss_input_range, test/test_nn.py::TestNN::test_bce_loss_size_mismatch, test/test_nn.py::TestNN::test_bce_with_logits_broadcasts_pos_weights, test/test_nn.py::TestNN::test_bce_with_logits_broadcasts_weights, test/test_nn.py::TestNN::test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss, test/test_nn.py::TestNN::test_bce_with_logits_gives_same_result_as_sigmoid_and_bce_loss_large_tensors_with_grad, test/test_nn.py::TestNN::test_bce_with_logits_has_correct_forward_grad, test/test_nn.py::TestNN::test_bce_with_logits_has_correct_grad_at_zero, test/test_nn.py::TestNN::test_bce_with_logits_ones_in_pos_weights_are_the_same_as_none, test/test_nn.py::TestNN::test_bce_with_logits_raises_if_target_and_input_are_different_size, test/test_nn.py::TestNN::test_bce_with_logits_stability, test/test_nn.py::TestNN::test_bce_with_logits_with_pos_weight_has_correct_grad_at_zero, test/test_nn.py::TestNN::test_bilinear, test/test_nn.py::TestNN::test_bilinear_broadcasting, test/test_nn.py::TestNN::test_bilinear_no_bias, test/test_nn.py::TestNN::test_bilinear_non_contiguous, test/test_nn.py::TestNN::test_bilinear_value_error, test/test_nn.py::TestNN::test_broadcast_double_backwards_gpu, test/test_nn.py::TestNN::test_broadcast_no_grad, test/test_nn.py::TestNN::test_broadcast_not_requiring_grad, test/test_nn.py::TestNN::test_buffer_bad_module_subclass, test/test_nn.py::TestNN::test_buffer_not_persistent, test/test_nn.py::TestNN::test_buffer_not_persistent_assign, test/test_nn.py::TestNN::test_buffer_not_persistent_del, test/test_nn.py::TestNN::test_buffer_not_persistent_load, test/test_nn.py::TestNN::test_buffer_not_persistent_overwrite, test/test_nn.py::TestNN::test_buffers_and_named_buffers, test/test_nn.py::TestNN::test_call_supports_python_dict_output, test/test_nn.py::TestNN::test_channel_shuffle_input_checks, test/test_nn.py::TestNN::test_channel_shuffle_return_alias_of_self, test/test_nn.py::TestNN::test_children, test/test_nn.py::TestNN::test_container_copy, test/test_nn.py::TestNN::test_convert_sync_batchnorm, test/test_nn.py::TestNN::test_cosine_embedding_loss_error_on_diff_shapes, test/test_nn.py::TestNN::test_cosine_embedding_loss_error_on_nonexpandable_shapes, test/test_nn.py::TestNN::test_cosine_embedding_loss_invalid_shape, test/test_nn.py::TestNN::test_cosine_embedding_loss_margin_no_reduce, test/test_nn.py::TestNN::test_cosine_embedding_loss_no_reduce, test/test_nn.py::TestNN::test_cosine_embedding_loss_with_diff_type, test/test_nn.py::TestNN::test_cosine_similarity, test/test_nn.py::TestNN::test_cross_entropy_loss, test/test_nn.py::TestNN::test_cross_entropy_loss_precision, test/test_nn.py::TestNN::test_cross_entropy_loss_zero_div, test/test_nn.py::TestNN::test_cudnn_forward_exception, test/test_nn.py::TestNN::test_cudnn_rnn_dropout_states_device, test/test_nn.py::TestNN::test_cudnn_weight_format, test/test_nn.py::TestNN::test_cudnn_weight_tying, test/test_nn.py::TestNN::test_dir, test/test_nn.py::TestNN::test_dir_digit, test/test_nn.py::TestNN::test_elu_inplace_gradgrad, test/test_nn.py::TestNN::test_elu_inplace_on_view, test/test_nn.py::TestNN::test_error_RNN_seq_len_zero, test/test_nn.py::TestNN::test_extra_state, test/test_nn.py::TestNN::test_extra_state_missing_get_extra_state, test/test_nn.py::TestNN::test_extra_state_missing_set_extra_state, test/test_nn.py::TestNN::test_extra_state_non_dict, test/test_nn.py::TestNN::test_fb_fc_packed, test/test_nn.py::TestNN::test_flatten, test/test_nn.py::TestNN::test_fold_invalid_arg, test/test_nn.py::TestNN::test_fractional_max_pool2d_invalid_output_ratio, test/test_nn.py::TestNN::test_gaussian_nll_loss_args, test/test_nn.py::TestNN::test_gaussian_nll_loss_broadcasting, test/test_nn.py::TestNN::test_gaussian_nll_loss_scalar_var, test/test_nn.py::TestNN::test_get_buffer, test/test_nn.py::TestNN::test_get_buffer_from_submodules, test/test_nn.py::TestNN::test_getattr_with_property, test/test_nn.py::TestNN::test_grid_sample, test/test_nn.py::TestNN::test_grid_sample_3d, test/test_nn.py::TestNN::test_grid_sample_error_checking, test/test_nn.py::TestNN::test_grid_sample_nearest_neighbor_rounding_mode_consistency, test/test_nn.py::TestNN::test_hardtanh_backward, test/test_nn.py::TestNN::test_hardtanh_inplace_gradgrad, test/test_nn.py::TestNN::test_huber_loss_invalid_delta, test/test_nn.py::TestNN::test_inplace_thnn, test/test_nn.py::TestNN::test_interpolate, test/test_nn.py::TestNN::test_interpolate_bicubic_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_bicubic_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_shared_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bicubic_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_bilinear_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_shared_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_align_corners, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_bilinear_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_buffer_overflow, test/test_nn.py::TestNN::test_interpolate_illegal_memory_access, test/test_nn.py::TestNN::test_interpolate_linear_1d, test/test_nn.py::TestNN::test_interpolate_linear_1d_align_corners, test/test_nn.py::TestNN::test_interpolate_linear_1d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_linear_1d_cuda, test/test_nn.py::TestNN::test_interpolate_linear_1d_zero_dim, test/test_nn.py::TestNN::test_interpolate_linear_1d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_align_corners, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_linear_scale_1d_cuda, test/test_nn.py::TestNN::test_interpolate_linear_tuple_1d, test/test_nn.py::TestNN::test_interpolate_linear_tuple_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_1d, test/test_nn.py::TestNN::test_interpolate_nearest_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_1d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_1d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d, test/test_nn.py::TestNN::test_interpolate_nearest_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d_launch_configs, test/test_nn.py::TestNN::test_interpolate_nearest_2d_launch_configs_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_2d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_2d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_3d, test/test_nn.py::TestNN::test_interpolate_nearest_3d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_3d_zero_dim, test/test_nn.py::TestNN::test_interpolate_nearest_3d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_1d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_2d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_scale_3d, test/test_nn.py::TestNN::test_interpolate_nearest_scale_3d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_1d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_1d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_2d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_2d_cuda, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_3d, test/test_nn.py::TestNN::test_interpolate_nearest_tuple_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_zero_dim, test/test_nn.py::TestNN::test_interpolate_trilinear_3d_zero_dim_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_align_corners, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_scale_3d_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_align_corners, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_nn.py::TestNN::test_interpolate_trilinear_tuple_3d_cuda, test/test_nn.py::TestNN::test_interpolate_undefined_behavior_casting, test/test_nn.py::TestNN::test_kl_div_log_softmax_target, test/test_nn.py::TestNN::test_kl_div_with_diff_type, test/test_nn.py::TestNN::test_kl_div_with_diff_type_log_target, test/test_nn.py::TestNN::test_l1_loss_correct, test/test_nn.py::TestNN::test_large_max_pool2d_ch_last, test/test_nn.py::TestNN::test_layer_norm_backwards_eps, test/test_nn.py::TestNN::test_layer_norm_eps, test/test_nn.py::TestNN::test_layer_norm_grads_with_create_graph_flag, test/test_nn.py::TestNN::test_layer_norm_large_tensor, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_bias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cpu_nobias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_bias_weightStrided, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCOO, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCSC, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightCSR, test/test_nn.py::TestNN::test_linear_autograd_device_cuda_nobias_weightStrided, test/test_nn.py::TestNN::test_linear_broadcasting, test/test_nn.py::TestNN::test_linear_raise_on_scalar_input, test/test_nn.py::TestNN::test_log_softmax_dim0, test/test_nn.py::TestNN::test_log_softmax_dim0_cuda, test/test_nn.py::TestNN::test_log_softmax_dim3, test/test_nn.py::TestNN::test_log_softmax_dim3_cuda, test/test_nn.py::TestNN::test_log_softmax_lastdim, test/test_nn.py::TestNN::test_log_softmax_lastdim_cuda, test/test_nn.py::TestNN::test_log_softmax_scalar, test/test_nn.py::TestNN::test_log_softmax_scalar_cuda, test/test_nn.py::TestNN::test_log_softmax_spatial, test/test_nn.py::TestNN::test_log_softmax_spatial_cuda, test/test_nn.py::TestNN::test_log_softmax_spatial_special, test/test_nn.py::TestNN::test_log_softmax_spatial_special_cuda, test/test_nn.py::TestNN::test_loss_equal_input_target_shape, test/test_nn.py::TestNN::test_margin_ranking_loss_margin_no_reduce, test/test_nn.py::TestNN::test_margin_ranking_loss_no_reduce, test/test_nn.py::TestNN::test_max_pool1d_invalid_output_size, test/test_nn.py::TestNN::test_module_apply_inplace_op, test/test_nn.py::TestNN::test_module_backcompat, test/test_nn.py::TestNN::test_module_super_init, test/test_nn.py::TestNN::test_module_to_argparse, test/test_nn.py::TestNN::test_modules, test/test_nn.py::TestNN::test_mse_loss_size_warning, test/test_nn.py::TestNN::test_multimarginloss_1d_input_0d_target_no_reduce, test/test_nn.py::TestNN::test_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_nn.py::TestNN::test_named_children, test/test_nn.py::TestNN::test_named_modules, test/test_nn.py::TestNN::test_named_parameters_remove_duplicate, test/test_nn.py::TestNN::test_native_channel_shuffle_return_alias_of_self, test/test_nn.py::TestNN::test_nested_tensor_from_mask, test/test_nn.py::TestNN::test_nested_tensor_from_mask_error, test/test_nn.py::TestNN::test_no_grad, test/test_nn.py::TestNN::test_non_leaf_parameters, test/test_nn.py::TestNN::test_normalize, test/test_nn.py::TestNN::test_overwrite_module_params_on_conversion, test/test_nn.py::TestNN::test_pack_sequence_batch_sizes_throw, test/test_nn.py::TestNN::test_pad_scalar_error, test/test_nn.py::TestNN::test_padding_list, test/test_nn.py::TestNN::test_pairwise_distance, test/test_nn.py::TestNN::test_parameter_assignment, test/test_nn.py::TestNN::test_parameterlistdict_pickle, test/test_nn.py::TestNN::test_parameterlistdict_setting_attributes, test/test_nn.py::TestNN::test_parameters_and_named_parameters, test/test_nn.py::TestNN::test_parameters_to_vector, test/test_nn.py::TestNN::test_parse_to, test/test_nn.py::TestNN::test_partial_flat_weights, test/test_nn.py::TestNN::test_pdist, test/test_nn.py::TestNN::test_pdist_cpu_gradgrad_unimplemented, test/test_nn.py::TestNN::test_pdist_cuda_gradgrad_unimplemented, test/test_nn.py::TestNN::test_pdist_empty_col, test/test_nn.py::TestNN::test_pdist_empty_row, test/test_nn.py::TestNN::test_pdist_large, test/test_nn.py::TestNN::test_pdist_zeros, test/test_nn.py::TestNN::test_pickle_module_no_weights_only_warning, test/test_nn.py::TestNN::test_pixel_shuffle_nhwc_cpu, test/test_nn.py::TestNN::test_pixel_shuffle_unshuffle, test/test_nn.py::TestNN::test_pointwise_loss_broadcast, test/test_nn.py::TestNN::test_pointwise_loss_target_grad_none_reduction, test/test_nn.py::TestNN::test_projections_errors_on_gru_and_rnn, test/test_nn.py::TestNN::test_projections_lstm_args_check, test/test_nn.py::TestNN::test_projections_lstm_check_device, test/test_nn.py::TestNN::test_projections_lstm_initial_hidden_state, test/test_nn.py::TestNN::test_register_buffer_allows_overwriting_with_same_name, test/test_nn.py::TestNN::test_register_buffer_allows_tensor_like_object, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_name_is_not_string, test/test_nn.py::TestNN::test_register_buffer_raises_error_if_not_tensor, test/test_nn.py::TestNN::test_register_parameter_allows_overwriting_with_same_name, test/test_nn.py::TestNN::test_register_parameter_raises_error_if_attr_exists, test/test_nn.py::TestNN::test_register_parameter_raises_error_if_name_is_not_string, test/test_nn.py::TestNN::test_relu_inplace_on_view, test/test_nn.py::TestNN::test_repr, test/test_nn.py::TestNN::test_requires_grad_, test/test_nn.py::TestNN::test_rnn_args_check, test/test_nn.py::TestNN::test_rnn_check_device, test/test_nn.py::TestNN::test_rnn_initial_hidden_state, test/test_nn.py::TestNN::test_rnn_weight_norm, test/test_nn.py::TestNN::test_set_submodule, test/test_nn.py::TestNN::test_share_memory, test/test_nn.py::TestNN::test_smoothl1loss_intergral_target, test/test_nn.py::TestNN::test_smoothl1loss_negative_beta_not_supported, test/test_nn.py::TestNN::test_softmax_functional_dim0, test/test_nn.py::TestNN::test_softmax_functional_dim0_cuda, test/test_nn.py::TestNN::test_softmax_functional_dim3, test/test_nn.py::TestNN::test_softmax_functional_dim3_cuda, test/test_nn.py::TestNN::test_softmax_functional_scalar, test/test_nn.py::TestNN::test_softmax_functional_scalar_cuda, test/test_nn.py::TestNN::test_softmax_lastdim, test/test_nn.py::TestNN::test_softmax_lastdim_cuda, test/test_nn.py::TestNN::test_softmax_lastdim_dtype, test/test_nn.py::TestNN::test_softmax_lastdim_dtype_cuda, test/test_nn.py::TestNN::test_softmax_spatial, test/test_nn.py::TestNN::test_softmax_spatial_cuda, test/test_nn.py::TestNN::test_softmax_spatial_dtype, test/test_nn.py::TestNN::test_softmax_spatial_dtype_cuda, test/test_nn.py::TestNN::test_softmax_spatial_special, test/test_nn.py::TestNN::test_softmax_spatial_special_cuda, test/test_nn.py::TestNN::test_softmin, test/test_nn.py::TestNN::test_spectral_norm, test/test_nn.py::TestNN::test_spectral_norm_dim, test/test_nn.py::TestNN::test_spectral_norm_forward, test/test_nn.py::TestNN::test_spectral_norm_load_state_dict, test/test_nn.py::TestNN::test_spectral_norm_pickle, test/test_nn.py::TestNN::test_state_dict, test/test_nn.py::TestNN::test_swap_module_params_poisons_acc_grad, test/test_nn.py::TestNN::test_sync_batchnorm_accuracy_cuda, test/test_nn.py::TestNN::test_sync_batchnorm_backward_elemt, test/test_nn.py::TestNN::test_threshold_bfloat16_half, test/test_nn.py::TestNN::test_threshold_int, test/test_nn.py::TestNN::test_to, test/test_nn.py::TestNN::test_train_errors_for_invalid_mode, test/test_nn.py::TestNN::test_transformer_args_check, test/test_nn.py::TestNN::test_transformer_layer_args_check, test/test_nn.py::TestNN::test_transformerdecoder, test/test_nn.py::TestNN::test_transformerdecoderlayer, test/test_nn.py::TestNN::test_transformerdecoderlayer_gelu, test/test_nn.py::TestNN::test_triplet_margin_loss, test/test_nn.py::TestNN::test_triplet_margin_loss_no_reduce, test/test_nn.py::TestNN::test_triplet_margin_loss_swap, test/test_nn.py::TestNN::test_triplet_margin_loss_swap_no_reduce, test/test_nn.py::TestNN::test_type, test/test_nn.py::TestNN::test_unflatten, test/test_nn.py::TestNN::test_unflatten_invalid_arg, test/test_nn.py::TestNN::test_unfold_invalid_arg, test/test_nn.py::TestNN::test_upsamplingBilinear2d_spatial_invariance, test/test_nn.py::TestNN::test_upsamplingLinear1d, test/test_nn.py::TestNN::test_upsamplingLinear1d_spatial_invariance, test/test_nn.py::TestNN::test_upsamplingTrilinear3d_spatial_invariance, test/test_nn.py::TestNN::test_upsampling_bfloat16, test/test_nn.py::TestNN::test_upsampling_not_recompute_scale_factor, test/test_nn.py::TestNN::test_upsampling_small_scale, test/test_nn.py::TestNN::test_vector_to_parameters, test/test_nn.py::TestNN::test_weight_norm, test/test_nn.py::TestNN::test_weight_norm_pickle, test/test_nn.py::TestNN::test_weighted_huber_loss, test/test_nn.py::TestNN::test_weighted_l1_loss_with_weights, test/test_nn.py::TestNN::test_weighted_mse_loss, test/test_nn.py::TestNN::test_zero_grad, test/test_nn.py::TestFusionEval::test_fuse_module_eval_numerics, test/test_nn.py::TestConstantPadNd::test_constant_pad_nd, test/test_nn.py::TestConstantPadNd::test_preserves_memory_format, test/test_nn.py::TestAddRelu::test_add_relu, test/test_nn.py::TestAddRelu::test_add_relu_broadcasting, test/test_nn.py::TestFunctionalPickle::test_pickle_softsign, test/test_nn.py::TestFusionUtils::test_fuse_conv_bn_requires_grad, test/test_nn.py::TestFusionUtils::test_fuse_linear_bn_requires_grad, test/test_nn.py::TestUtils::test_consume_prefix_in_state_dict_if_present, test/test_nn.py::TestNNDeviceTypeCUDA::test_BatchNorm_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_Bilinear_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_cudnn_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_empty_target_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_mean_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_mean_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_none_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_none_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_sum_use_module_form_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_no_batch_dim_reduction_sum_use_module_form_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GRU_grad_and_gradgrad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_memory_format_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_numeric_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_GroupNorm_raises_error_if_one_value_per_group_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm1d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm2d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_InstanceNorm3d_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_differentiable_backward_using_oneDNN_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_differentiable_backward_using_oneDNN_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_grad_and_gradgrad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_LayerNorm_general_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LayerNorm_numeric_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_LocalResponseNorm_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_race_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_race_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_MarginLoss_warnings_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad2d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad2d_large_deterministic_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad3d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_empty_cuda_complex64, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReflectionPad_fails_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad1d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad2d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad3d_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad_empty_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_ReplicationPad_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerDecoderLayer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerDecoder_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerEncoderLayer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_TransformerEncoder_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_Transformer_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_Unfold_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_half_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_activations_bfloat16_half_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_adaptiveavg_pool1d_shmem_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate45_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotate90_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_2d_rotateRandom_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_affine_3d_rotateRandom_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_avg_pool_large_tensor2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_avg_pool_large_tensor_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_affine_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_eval_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_large_batch_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_large_batch_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_mixed_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_simple_average_mixed_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_batchnorm_update_stats_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_channel_shuffle_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_error_if_nonfinite_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_0_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_1_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_4_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_False_norm_type_inf_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_0_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_1_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_2_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_4_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_foreach_True_norm_type_inf_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_multi_device_foreach_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_norm_multi_device_foreach_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_value_foreach_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_clip_grad_value_foreach_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_empty_input_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_64bit_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_consistent_index_target_and_probs_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_errors_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_weight_ignore_indices_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_with_probs_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_2d_out_of_bounds_class_index_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_index_target_unit_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_one_hot_target_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_all_reductions_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_mean_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_mean_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_none_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_none_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_sum_weighted_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_no_batch_dim_reduction_sum_weighted_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_loss_prob_target_unit_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_tensor_cpu_length_cuda_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_tensor_cuda_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_error_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_cudnn_rnn_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_cudnn_rnn_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_device_mask_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_elu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_elu_inplace_with_neg_alpha_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_fold_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_glu_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_bfloat16_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_half_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_2d_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_2d_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_3d_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_large_index_3d_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_nan_inf_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_grid_sample_nan_inf_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_groupnorm_nhwc_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_gumbel_softmax_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardsigmoid_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_corner_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_hardswish_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_for_single_spatial_element_during_training_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm1d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm2d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_False_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_False_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_True_affine_False_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_input_channels_is_not_num_features_InstanceNorm3d_no_batch_dim_True_affine_True_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_instancenorm_raises_error_if_less_than_one_value_per_channel_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_invalid_reduction_strings_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_large_max_pool2d_ch_last_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_large_max_pool_contig_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_large_reflect_pad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_layernorm_half_precision_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_layernorm_weight_bias_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_with_neg_slope_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_leaky_relu_inplace_with_zero_slope_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_linear_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_big_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_big_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_log_softmax_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_logsigmoid_out_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_lstmcell_backward_only_one_output_grad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_TxT_layout_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_devices_parity_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_forward_with_nans_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_grad_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_lowp_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_lowp_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_mask_types_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_masked_softmax_transformer_layout_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_mish_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_module_to_empty_non_recursive_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_mse_loss_error_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_1d_input_1d_target_invalid_size_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_all_ignored_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_byte_target_matches_long_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_mean_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_none_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_empty_tensor_reduction_sum_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_invalid_target_dim_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_invalid_weights_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_mean_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_none_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_large_tensor_reduction_sum_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_mismatched_batch_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_out_of_bounds_ignore_index_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nll_loss_total_weight_is_zero_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_empty_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_scalars_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nn_scalars_reductions_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_nonlinearity_propagate_nan_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_one_hot_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_overwrite_module_params_on_conversion_cpu_device_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_pad_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_pad_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_prelu_backward_32bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_replicatepad_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_epsilon_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_numeric_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rmsnorm_numeric_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_fused_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_fused_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_rnn_retain_variables_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_rrelu_bounds_validation_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_save_lstm_compatibility_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_silu_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_skip_init_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smooth_l1_loss_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smooth_l1_loss_vs_huber_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_smoothl1loss_backward_zero_beta_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_64bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_smem_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_unaligned_grad_output_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_unaligned_output_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_backward_without_fully_vectorized_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_bfloat16_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_bfloat16_half_to_float_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cpu_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cpu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_double_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_forward_64bit_indexing_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_results_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_softmax_results_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_softplus_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softplus_low_threshold_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_softshrink_negative_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_threshold_inplace_overlap_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_complex128, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_complex64, test/test_nn.py::TestNNDeviceTypeCUDA::test_to_complex_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_fast_path_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_gelu_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_transformerencoderlayer_gelu_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_triplet_margin_with_distance_loss_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_triplet_margin_with_distance_loss_default_parity_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_False_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_False_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_True_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format0_align_corners_True_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_False_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_False_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_True_input_size_399_output_size_437_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiLinear2d_consistency_interp_size_bug_memory_format1_align_corners_True_input_size_403_output_size_377_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_False_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bicubic_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bicubic_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bilinear_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_antialias_True_align_corners_True_mode_bilinear_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format0_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bicubic_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_False_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_False_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_3_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_32_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_False_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_False_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_restrided_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_consistency_memory_format1_mode_bilinear_antialias_True_align_corners_True_num_channels_5_output_size_600_check_as_unsqueezed_3d_tensor_True_non_contig_sliced_batch_size_5_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_3_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_False_num_channels_5_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_3_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bicubic_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_bilinear_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest-exact_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_float32_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_float64_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int16_cuda_int16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int32_cuda_int32, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int64_cuda_int64, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_int8_cuda_int8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBiMode2d_nonsupported_dtypes_antialias_True_num_channels_5_mode_nearest_uint8_cuda_uint8, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_aa_correctness_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_aa_correctness_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBicubic2d_correctness_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBilinear2d_aa_correctness_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingBilinear2d_aa_correctness_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_correctness_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_correctness_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest1d_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_fail_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_rocm_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format0_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format0_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format1_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_memory_format1_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_launch_config_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format0_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format0_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format1_mode_nearest-exact_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest3d_memory_format1_mode_nearest_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_correctness_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_correctness_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact1d_rescale_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact2d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format0_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format0_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format1_isize_10_osize_15_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearestExact3d_correctness_memory_format1_isize_20_osize_11_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_False_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_False_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_True_memory_format0_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingTrilinear3d_align_corners_True_memory_format1_cuda, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsampling_64bit_indexing_channels_last_cuda_bfloat16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsampling_64bit_indexing_channels_last_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingnearest2d_backward_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float32, test/test_nn.py::TestNNDeviceTypeCUDA::test_variable_sequence_cuda_float64, test/test_nn.py::TestNNDeviceTypeCUDA::test_warp_softmax_64bit_indexing_cuda_float16, test/test_nn.py::TestNNDeviceTypeCUDA::test_warp_softmax_64bit_indexing_cuda_float32
2025-12-04T14:32:36.8459252Z 
2025-12-04T14:32:36.8459444Z Finished test_nn 1/1 ... [2025-12-04 14:32:36.684475][20396.612766287], took 5.78min
2025-12-04T14:32:36.8460058Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nn/test_nn-9b13ea9411b68db1.xml
2025-12-04T14:32:38.4145883Z Uploading artifacts took 1.50 seconds
2025-12-04T14:32:38.4149131Z Running torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 14:32:38.414671][20398.342967493]
2025-12-04T14:32:38.4149656Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:32:38.4153089Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_index_tricks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:32:38.415040]
2025-12-04T14:32:41.9360988Z 
2025-12-04T14:32:41.9362447Z torch_np/numpy_tests/lib/test_index_tricks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a3265ffbea1c2963_.log
2025-12-04T14:32:41.9375486Z Running 47 items in this shard: test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_big_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_clipmodes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_dtypes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_clip, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_raise, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_unravel, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_writeability, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_longdouble, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npcomplexfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_linspace_equivalence, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start0_stop_10_step0_expected0, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start_-10_stop_20_step1_expected1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_nd, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_sparse, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_1d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_2d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_complex_step, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_more_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdenumerate::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_simple_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_1d_only, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_bool, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_repeated_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_shape_and_dtype, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestC::test_c_, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_hetero_shape_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_low_dim_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_operate_4d_array, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_wide_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndices::test_diag_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_diag_indices_from, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_shape_mismatch, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_small_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdIndex::test_ndindex
2025-12-04T14:32:41.9387105Z 
2025-12-04T14:32:41.9387382Z Finished torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 14:32:41.935902][20401.864186735], took 0.06min
2025-12-04T14:32:41.9707285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-6bd6911b2987aa11.xml
2025-12-04T14:32:42.0032822Z Running test_jit_autocast 1/1 ... [2025-12-04 14:32:42.003037][20401.931335564]
2025-12-04T14:32:42.0033515Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:32:42.0036271Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_jit_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:32:42.003335]
2025-12-04T14:33:04.0700866Z 
2025-12-04T14:33:04.0701669Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_125de5112eb33739_.log
2025-12-04T14:33:04.0712607Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing
2025-12-04T14:33:04.0723123Z 
2025-12-04T14:33:04.0723325Z Finished test_jit_autocast 1/1 ... [2025-12-04 14:33:04.069940][20423.998237005], took 0.37min
2025-12-04T14:33:04.1046281Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-a884db72e4413a95.xml
2025-12-04T14:33:04.1831916Z Running nn/test_pooling 1/1 ... [2025-12-04 14:33:04.182930][20424.111229562]
2025-12-04T14:33:04.1832366Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:33:04.1835179Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pooling.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:04.183242]
2025-12-04T14:33:14.0156003Z 
2025-12-04T14:33:14.0156798Z nn/test_pooling 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pooling_1.1_2fc4f4b87243805c_.log
2025-12-04T14:33:14.0196681Z Running 147 items in this shard: test/nn/test_pooling.py::TestAvgPool::test_avg_pool1d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool2d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_avg_pool3d_ceil_mode, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool2d_with_divisor, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d, test/nn/test_pooling.py::TestAvgPool::test_doubletensor_avg_pool3d_with_divisor, test/nn/test_pooling.py::TestPoolingNN::test_MaxUnpool2d_output_size, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_nhwc_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_avg_pooling_overflow, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_backward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_launch_config_forward, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_avg_nhwc_non_contiguous, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_lower_precision, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_none, test/nn/test_pooling.py::TestPoolingNN::test_adaptive_pooling_size_overflow, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool2d_nhwc_cpu, test/nn/test_pooling.py::TestPoolingNN::test_max_unpool3d_input_check, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool1d_empty_kernel, test/nn/test_pooling.py::TestPoolingNN::test_quantized_max_pool3d, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AdaptiveMaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool2d_empty_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_AvgPool3d_backward_after_cat_dim1_device_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool2d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_batch_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_out_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_FractionalMaxPool3d_zero_samples_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_LPPool1d_kernel_size_overflow_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool1d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool2d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_errors_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool3d_indices_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxPool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case10_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case4_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case5_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case6_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case7_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case8_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_index_errors_case9_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_invalid_output_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_MaxUnpool_zero_batch_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool2d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pool3d_output_size_one_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_avg_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_max_pooling_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pool_odd_size_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_empty_output_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_int8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_no_suppot_input_cuda_uint8, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_adaptive_pooling_zero_batch_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_avg_pool2d_reduced_floating_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool3d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_fractional_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_corner_cases_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool1d_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_corner_cases_cuda_int64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool2d_with_indices_backward_fails_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool3d_ndhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_bfloat16_half_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_pool_nan_inf_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_max_unpool_invalid_indices_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool3d_non_square_backward_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_maxpool_indices_no_batch_dim_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_large_size_int64_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool3d_size_one_feature_dim_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_invalid_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_bfloat16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float16, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pool_large_size_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_bfloat16_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_large_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float32, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_max_nhwc_cuda_float64, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_avg_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_1_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_2_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_shape_kernel_max_pooling_dims_3_cuda, test/nn/test_pooling.py::TestPoolingNNDeviceTypeCUDA::test_pooling_zero_stride_cuda
2025-12-04T14:33:14.0234832Z 
2025-12-04T14:33:14.0235030Z Finished nn/test_pooling 1/1 ... [2025-12-04 14:33:14.015642][20433.943940075], took 0.16min
2025-12-04T14:33:14.0503950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-21d4709389e5b8ee.xml
2025-12-04T14:33:14.1551794Z Running nn/test_embedding 1/1 ... [2025-12-04 14:33:14.154929][20434.083227628]
2025-12-04T14:33:14.1552212Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:33:14.1554572Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_embedding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:14.155216]
2025-12-04T14:33:26.6899079Z 
2025-12-04T14:33:26.6899978Z nn/test_embedding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_embedding_1.1_0d2f816320a0d140_.log
2025-12-04T14:33:26.6953497Z Running 156 items in this shard: test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_bag_padding_idx_error, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_float64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int16, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int32, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int64, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_int8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_padding_idx, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_from_pretrained_uint8, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_functional, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_max_norm_unsorted_repeating_indices, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_basic, test/nn/test_embedding.py::TestEmbeddingNN::test_embedding_sparse_empty_tensor, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_2d_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_from_pretrained_options, test/nn/test_embedding.py::TestEmbeddingNN::test_embeddingbag_include_last_offset, test/nn/test_embedding.py::TestEmbeddingNN::test_large_tensors, test/nn/test_embedding.py::TestEmbeddingNN::test_move_sparse_half_embedding, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_empty_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_no_offsets_cuda_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_EmbeddingBag_per_sample_weights_failures_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_backward_large_batch_overflow_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_1D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_bfloat16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_2D_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_bfloat16_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_device_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_dimension_errors_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_empty_input_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_half_cuda_int64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int32_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int32_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_non_contiguous_weight_cuda_int64_int64_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_max_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_mean_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float32_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_out_of_bounds_idx_padding_idx_0_mode_sum_cuda_float64_int64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_False_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_False_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_bag_per_sample_weights_grad_bag_use_grad_True_per_sample_weights_use_grad_True_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_dense_grad_cuda, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_backward_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_device_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_max_norm_fwd_AD_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float16, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float32, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_padding_idx_cuda_float64, test/nn/test_embedding.py::TestEmbeddingNNDeviceTypeCUDA::test_embedding_scalar_weight_error_cuda
2025-12-04T14:33:26.7004482Z 
2025-12-04T14:33:26.7004689Z Finished nn/test_embedding 1/1 ... [2025-12-04 14:33:26.689980][20446.61827472], took 0.21min
2025-12-04T14:33:26.7250628Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-ee3bcbb253c53d76.xml
2025-12-04T14:33:26.8136493Z Running test_xnnpack_integration 1/1 ... [2025-12-04 14:33:26.813409][20446.741708008]
2025-12-04T14:33:26.8136936Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:33:26.8139794Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_xnnpack_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:26.813719]
2025-12-04T14:33:37.1340874Z 
2025-12-04T14:33:37.1341744Z test_xnnpack_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_1.1_cee22bfc3e04069b_.log
2025-12-04T14:33:37.1345621Z Running 12 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear_1d_input, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_combined_model, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_decomposed_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_basic, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_with_relu_fc
2025-12-04T14:33:37.1348892Z 
2025-12-04T14:33:37.1349172Z Finished test_xnnpack_integration 1/1 ... [2025-12-04 14:33:37.133695][20457.061988547], took 0.17min
2025-12-04T14:33:37.1683612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-f847b013a309e4b4.xml
2025-12-04T14:33:37.2587629Z Running test_cuda_trace 1/1 ... [2025-12-04 14:33:37.258505][20457.186804792]
2025-12-04T14:33:37.2588048Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:33:37.2591028Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_trace.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:33:37.258825]
2025-12-04T14:34:21.1299881Z 
2025-12-04T14:34:21.1300648Z test_cuda_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_trace_1.1_a09bfc4c91236df9_.log
2025-12-04T14:34:21.1303814Z Running 12 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called, test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback, test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback, test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization, test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback, test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback
2025-12-04T14:34:21.1306483Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_all_trace_callbacks_called
2025-12-04T14:34:21.1307025Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_device_synchronization_callback
2025-12-04T14:34:21.1307837Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_creation_callback
2025-12-04T14:34:21.1308366Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_deletion_callback
2025-12-04T14:34:21.1308876Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_record_callback
2025-12-04T14:34:21.1309402Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_synchronization_callback
2025-12-04T14:34:21.1310045Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_event_wait_callback
2025-12-04T14:34:21.1310566Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memcpy_synchronization
2025-12-04T14:34:21.1311086Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_allocation_callback
2025-12-04T14:34:21.1311639Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_memory_deallocation_callback
2025-12-04T14:34:21.1312162Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_creation_callback
2025-12-04T14:34:21.1312681Z Running 1 items in this shard: test/test_cuda_trace.py::TestCudaTrace::test_stream_synchronization_callback
2025-12-04T14:34:21.1312999Z 
2025-12-04T14:34:21.1313181Z Finished test_cuda_trace 1/1 ... [2025-12-04 14:34:21.130001][20501.058296639], took 0.73min
2025-12-04T14:34:21.1654615Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d415fb392a79d6b5.xml
2025-12-04T14:34:21.2481933Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-020323916fe208b4.xml
2025-12-04T14:34:21.2781412Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-ec2ab5853a8f5bb5.xml
2025-12-04T14:34:21.3049778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2344770aa1066fe.xml
2025-12-04T14:34:21.3324132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7974fe8ff8b4bbf1.xml
2025-12-04T14:34:21.3609026Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a89ca46644423e33.xml
2025-12-04T14:34:21.3908728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a24d5339b3f8a425.xml
2025-12-04T14:34:21.4181212Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-e1f683ecde7e9859.xml
2025-12-04T14:34:21.4586455Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-18c9d7ef1c10ca2f.xml
2025-12-04T14:34:21.4897647Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c34005545b418b69.xml
2025-12-04T14:34:21.5146544Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-fef6b14731bb5c3b.xml
2025-12-04T14:34:21.5464047Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-9f269287574f2f5d.xml
2025-12-04T14:34:21.5748947Z Running test_native_mha 1/1 ... [2025-12-04 14:34:21.574689][20501.502988246]
2025-12-04T14:34:21.5749377Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:34:21.5752289Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_native_mha.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:21.574982]
2025-12-04T14:34:25.7962187Z 
2025-12-04T14:34:25.7963266Z test_native_mha 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_native_mha_1.1_0baa40d098e975c1_.log
2025-12-04T14:34:25.7992622Z Running 54 items in this shard: test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_attention_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_attention_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_encoder_decoder_attention_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_encoder_decoder_attention_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_transform_bias_rescale_qkv_cuda_float32, test/test_native_mha.py::TestMHADeviceTypeCUDA::test_transform_bias_rescale_qkv_nested_cuda_float32
2025-12-04T14:34:25.8019794Z 
2025-12-04T14:34:25.8019981Z Finished test_native_mha 1/1 ... [2025-12-04 14:34:25.796051][20505.72434675], took 0.07min
2025-12-04T14:34:25.8325473Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_native_mha/test_native_mha-92c62998ce15c273.xml
2025-12-04T14:34:25.8648914Z Running torch_np/numpy_tests/core/test_numerictypes 1/1 ... [2025-12-04 14:34:25.864651][20505.792949057]
2025-12-04T14:34:25.8649732Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:34:25.8652192Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_numerictypes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:25.864943]
2025-12-04T14:34:29.2353113Z 
2025-12-04T14:34:29.2354789Z torch_np/numpy_tests/core/test_numerictypes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_numerictypes_1.1_2b678d8fcdaea099_.log
2025-12-04T14:34:29.2367562Z Running 34 items in this shard: test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_loses1, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_loses2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_wins, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_wins2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestCommonType::test_scalar_wins3, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_both_abstract, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_nondtype_nonscalartype, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_same, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_sibling_class, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_subclass, test/torch_np/numpy_tests/core/test_numerictypes.py::TestIsSubDType::test_subclass_backwards, test/torch_np/numpy_tests/core/test_numerictypes.py::TestBitName::test_abstract, test/torch_np/numpy_tests/core/test_numerictypes.py::TestDocStrings::test_platform_dependent_aliases, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t0, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t1, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t3, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t4, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t5, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t6, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t7, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t8, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_undersood_by_dtype_t9, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_are_unique, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t0, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t1, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t2, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t3, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t4, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t5, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t6, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t7, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t8, test/torch_np/numpy_tests/core/test_numerictypes.py::TestScalarTypeNames::test_names_reflect_attributes_t9
2025-12-04T14:34:29.2377553Z 
2025-12-04T14:34:29.2377842Z Finished torch_np/numpy_tests/core/test_numerictypes 1/1 ... [2025-12-04 14:34:29.234992][20509.163287227], took 0.06min
2025-12-04T14:34:29.2719639Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numerictypes/torch_np.numpy_tests.core.test_numerictypes-794afd975a25dd08.xml
2025-12-04T14:34:29.3027120Z Running test_cuda_nvml_based_avail 1/1 ... [2025-12-04 14:34:29.302486][20509.230785716]
2025-12-04T14:34:29.3027554Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:34:29.3030469Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_nvml_based_avail.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:29.302790]
2025-12-04T14:34:58.9806270Z 
2025-12-04T14:34:58.9807292Z test_cuda_nvml_based_avail 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_nvml_based_avail_1.1_f1988982a9fd6374_.log
2025-12-04T14:34:58.9811847Z Running 9 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init2, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_0, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_1, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init2, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_0, test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_1, test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_env_var_parsing, test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_ordinal_parse_visible_devices, test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_partial_uuid_resolver
2025-12-04T14:34:58.9814922Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init2
2025-12-04T14:34:58.9816001Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_0
2025-12-04T14:34:58.9817470Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_False_avoid_init_1
2025-12-04T14:34:58.9818372Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init2
2025-12-04T14:34:58.9819128Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_0
2025-12-04T14:34:58.9819898Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestExtendedCUDAIsAvail::test_cuda_is_available_nvml_avail_True_avoid_init_1
2025-12-04T14:34:58.9820565Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_env_var_parsing
2025-12-04T14:34:58.9821183Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_ordinal_parse_visible_devices
2025-12-04T14:34:58.9821824Z Running 1 items in this shard: test/test_cuda_nvml_based_avail.py::TestVisibleDeviceParses::test_partial_uuid_resolver
2025-12-04T14:34:58.9822183Z 
2025-12-04T14:34:58.9822390Z Finished test_cuda_nvml_based_avail 1/1 ... [2025-12-04 14:34:58.980513][20538.908808132], took 0.49min
2025-12-04T14:34:59.0178655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-264e490a42609567.xml
2025-12-04T14:34:59.1032939Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3424f861fbb3f30b.xml
2025-12-04T14:34:59.1376259Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-34f31a11d054a5e2.xml
2025-12-04T14:34:59.1694336Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-5cc2b0915fc09b1e.xml
2025-12-04T14:34:59.2259404Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8b636380fe175cf1.xml
2025-12-04T14:34:59.2577247Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3e5b65c914a199ce.xml
2025-12-04T14:34:59.2999308Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-df0d139575237859.xml
2025-12-04T14:34:59.3426213Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3264b6f40acb9dc5.xml
2025-12-04T14:34:59.3738636Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8f705997ba04d72c.xml
2025-12-04T14:34:59.4339855Z Running test_function_schema 1/1 ... [2025-12-04 14:34:59.433744][20539.362041768]
2025-12-04T14:34:59.4340291Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:34:59.4343071Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_function_schema.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:34:59.434032]
2025-12-04T14:35:02.9546880Z 
2025-12-04T14:35:02.9548547Z test_function_schema 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_function_schema_1.1_77625ad433e579dc_.log
2025-12-04T14:35:02.9554056Z Running 15 items in this shard: test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_arguments, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_outputs, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_structure, test/test_function_schema.py::TestFunctionSchema::test_backward_compatible_with_smart_serialization, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_real_use_case, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_with_out, test/test_function_schema.py::TestFunctionSchema::test_forward_compatible_arguments_without_out, test/test_function_schema.py::TestFunctionSchema::test_hash_schema, test/test_function_schema.py::TestFunctionSchema::test_out_schema, test/test_function_schema.py::TestFunctionSchema::test_schema_error, test/test_function_schema.py::TestFunctionSchema::test_serialize_and_deserialize, test/test_function_schema.py::TestFunctionSchema::test_string_optional_parameter_default_value, test/test_function_schema.py::TestFunctionSchema::test_sym_int_argument_properly_parsed, test/test_function_schema.py::TestFunctionSchema::test_tensor_list_alias_annotation_properly_parsed, test/test_function_schema.py::TestFunctionSchema::test_tensor_option_arguments_properly_parsed
2025-12-04T14:35:02.9558117Z 
2025-12-04T14:35:02.9558334Z Finished test_function_schema 1/1 ... [2025-12-04 14:35:02.954338][20542.882626612], took 0.06min
2025-12-04T14:35:02.9923486Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_function_schema/test_function_schema-23d7f60e4862c430.xml
2025-12-04T14:35:03.0264645Z Running test_accelerator 1/1 ... [2025-12-04 14:35:03.026224][20542.954524021]
2025-12-04T14:35:03.0265345Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:03.0267855Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_accelerator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:03.026513]
2025-12-04T14:35:06.7976756Z 
2025-12-04T14:35:06.7977712Z test_accelerator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_accelerator_1.1_df51eee5e1970e59_.log
2025-12-04T14:35:06.7981622Z Running 12 items in this shard: test/test_accelerator.py::TestAccelerator::test_current_accelerator, test/test_accelerator.py::TestAccelerator::test_current_stream_query, test/test_accelerator.py::TestAccelerator::test_device_context_manager, test/test_accelerator.py::TestAccelerator::test_generic_event_behavior, test/test_accelerator.py::TestAccelerator::test_generic_multi_device_behavior, test/test_accelerator.py::TestAccelerator::test_generic_stream_behavior, test/test_accelerator.py::TestAccelerator::test_get_memory_info, test/test_accelerator.py::TestAccelerator::test_memory_stats, test/test_accelerator.py::TestAccelerator::test_multi_device_context_manager, test/test_accelerator.py::TestAccelerator::test_multi_device_stream_context_manager, test/test_accelerator.py::TestAccelerator::test_pin_memory_on_non_blocking_copy, test/test_accelerator.py::TestAccelerator::test_stream_context_manager
2025-12-04T14:35:06.7984713Z 
2025-12-04T14:35:06.7984921Z Finished test_accelerator 1/1 ... [2025-12-04 14:35:06.797373][20546.725663795], took 0.06min
2025-12-04T14:35:06.8349910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_accelerator/test_accelerator-96bd402fc51988fa.xml
2025-12-04T14:35:06.8701594Z Running nn/test_init 1/1 ... [2025-12-04 14:35:06.869916][20546.798215867]
2025-12-04T14:35:06.8702008Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:06.8704828Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_init.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:06.870206]
2025-12-04T14:35:13.2956163Z 
2025-12-04T14:35:13.2957639Z nn/test_init 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_init_1.1_83fc7b147f0c612f_.log
2025-12-04T14:35:13.2965753Z Running 30 items in this shard: test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu, test/nn/test_init.py::TestNNInit::test_calculate_gain_leaky_relu_only_accepts_numbers, test/nn/test_init.py::TestNNInit::test_calculate_gain_linear, test/nn/test_init.py::TestNNInit::test_calculate_gain_nonlinear, test/nn/test_init.py::TestNNInit::test_calculate_gain_only_accepts_valid_nonlinearities, test/nn/test_init.py::TestNNInit::test_constant, test/nn/test_init.py::TestNNInit::test_deprecation, test/nn/test_init.py::TestNNInit::test_dirac_identity, test/nn/test_init.py::TestNNInit::test_dirac_only_works_on_3_4_5d_inputs, test/nn/test_init.py::TestNNInit::test_dirac_properties, test/nn/test_init.py::TestNNInit::test_eye, test/nn/test_init.py::TestNNInit::test_eye_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_kaiming_normal, test/nn/test_init.py::TestNNInit::test_kaiming_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_normal_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_kaiming_uniform, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_kaiming_uniform_warning_on_0element_tensor, test/nn/test_init.py::TestNNInit::test_normal, test/nn/test_init.py::TestNNInit::test_ones_and_zeros, test/nn/test_init.py::TestNNInit::test_orthogonal, test/nn/test_init.py::TestNNInit::test_sparse_default_std, test/nn/test_init.py::TestNNInit::test_sparse_only_works_on_2d_inputs, test/nn/test_init.py::TestNNInit::test_trunc_normal, test/nn/test_init.py::TestNNInit::test_trunc_normal_generator, test/nn/test_init.py::TestNNInit::test_uniform, test/nn/test_init.py::TestNNInit::test_xavier_normal, test/nn/test_init.py::TestNNInit::test_xavier_normal_errors_on_inputs_smaller_than_2d, test/nn/test_init.py::TestNNInit::test_xavier_uniform, test/nn/test_init.py::TestNNInit::test_xavier_uniform_errors_on_inputs_smaller_than_2d
2025-12-04T14:35:13.2971346Z 
2025-12-04T14:35:13.2971565Z Finished nn/test_init 1/1 ... [2025-12-04 14:35:13.295298][20553.223588868], took 0.11min
2025-12-04T14:35:13.3344332Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_init/nn.test_init-4d2f3b79bbd3891e.xml
2025-12-04T14:35:13.3936482Z Running torch_np/numpy_tests/core/test_scalar_methods 1/1 ... [2025-12-04 14:35:13.393381][20553.321678288]
2025-12-04T14:35:13.3937059Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:13.3939597Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_scalar_methods.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:13.393681]
2025-12-04T14:35:16.4132838Z 
2025-12-04T14:35:16.4134606Z torch_np/numpy_tests/core/test_scalar_methods 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_scalar_methods_1.1_ab75aa6938d6d2cf_.log
2025-12-04T14:35:16.4156463Z Running 77 items in this shard: test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_against_known_values, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_errors_ftype0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_errors_ftype1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_errors_ftype2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_roundtrip_ftype0_frac_vals0_exp_vals0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_roundtrip_ftype1_frac_vals1_exp_vals1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_roundtrip_ftype2_frac_vals2_exp_vals2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_simple_fractions_ftype0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_simple_fractions_ftype1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_simple_fractions_ftype2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_-0_875_ratio1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_0_0_ratio2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_0_875_ratio0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype0_f_11_5_ratio3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_-0_875_ratio1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_0_0_ratio2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_0_875_ratio0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype1_f_11_5_ratio3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_-0_875_ratio1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_0_0_ratio2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_0_875_ratio0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestAsIntegerRatio::test_small_ftype2_f_11_5_ratio3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_b, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_h, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_i, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_false_code_l, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_inf_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_inf_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_inf_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_nan_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_nan_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_special_str_value_nan_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_B, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_b, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_h, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_i, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestIsInteger::test_true_code_l, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls4, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_cls5, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_complexfloating_subscript_tuple_arg_len_3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_abc_non_numeric_cls0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_?, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_B, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_D, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_F, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_b, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_d, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_e, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_f, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_h, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_i, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_concrete_code_l, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_scalar, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestClassGetItem::test_subscript_tuple_arg_len_3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_bit_count, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype0, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype1, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype2, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype3, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype4, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype5, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype6, test/torch_np/numpy_tests/core/test_scalar_methods.py::TestBitCount::test_small_itype7
2025-12-04T14:35:16.4177476Z 
2025-12-04T14:35:16.4177755Z Finished torch_np/numpy_tests/core/test_scalar_methods 1/1 ... [2025-12-04 14:35:16.413193][20556.341489955], took 0.05min
2025-12-04T14:35:16.4520203Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_methods/torch_np.numpy_tests.core.test_scalar_methods-ab020ac5345dfbce.xml
2025-12-04T14:35:16.4965989Z Running torch_np/numpy_tests/fft/test_helper 1/1 ... [2025-12-04 14:35:16.496316][20556.424615962]
2025-12-04T14:35:16.4966804Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:16.4969194Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/fft/test_helper.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:16.496610]
2025-12-04T14:35:24.1234875Z 
2025-12-04T14:35:24.1236505Z torch_np/numpy_tests/fft/test_helper 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.fft.test_helper_1.1_692efc716c4081b6_.log
2025-12-04T14:35:24.1239491Z Running 8 items in this shard: test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_axes_keyword, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_definition, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_equal_to_original, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_inverse, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTShift::test_uneven_dims, test/torch_np/numpy_tests/fft/test_helper.py::TestFFTFreq::test_definition, test/torch_np/numpy_tests/fft/test_helper.py::TestRFFTFreq::test_definition, test/torch_np/numpy_tests/fft/test_helper.py::TestIRFFTN::test_not_last_axis_success
2025-12-04T14:35:24.1241548Z 
2025-12-04T14:35:24.1241803Z Finished torch_np/numpy_tests/fft/test_helper 1/1 ... [2025-12-04 14:35:24.123094][20564.051384261], took 0.13min
2025-12-04T14:35:24.1622193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_helper/torch_np.numpy_tests.fft.test_helper-f56f2bb61bb011d4.xml
2025-12-04T14:35:24.2945823Z Running test_mobile_optimizer 1/1 ... [2025-12-04 14:35:24.294304][20564.222602934]
2025-12-04T14:35:24.2946541Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:24.2948736Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_mobile_optimizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:24.294583]
2025-12-04T14:35:29.2173716Z 
2025-12-04T14:35:29.2174636Z test_mobile_optimizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mobile_optimizer_1.1_9983f8f253c5b059_.log
2025-12-04T14:35:29.2177210Z Running 7 items in this shard: test/test_mobile_optimizer.py::TestOptimizer::test_clone_module_with_class, test/test_mobile_optimizer.py::TestOptimizer::test_generate_mobile_module_lints, test/test_mobile_optimizer.py::TestOptimizer::test_hoist_conv_packed_params, test/test_mobile_optimizer.py::TestOptimizer::test_mobilenet_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_optimize_for_mobile, test/test_mobile_optimizer.py::TestOptimizer::test_preserve_bundled_inputs_methods, test/test_mobile_optimizer.py::TestOptimizer::test_quantized_conv_no_asan_failures
2025-12-04T14:35:29.2179704Z 
2025-12-04T14:35:29.2179974Z Finished test_mobile_optimizer 1/1 ... [2025-12-04 14:35:29.217130][20569.145425029], took 0.08min
2025-12-04T14:35:29.2559111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-6eaefe04adaaf056.xml
2025-12-04T14:35:29.2939934Z Running test_overrides 1/1 ... [2025-12-04 14:35:29.293759][20569.2220571]
2025-12-04T14:35:29.2940393Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:29.2943398Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_overrides.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:29.294079]
2025-12-04T14:35:35.9195668Z 
2025-12-04T14:35:35.9196818Z test_overrides 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_overrides_1.1_89ffca0310ffa68a_.log
2025-12-04T14:35:35.9526487Z Running 1477 items in this shard: test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_H___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_T___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__backward_hooks___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__base___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__cdata___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__grad_fn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__post_accumulate_grad_hooks___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase__version___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_data___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_device___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_dtype___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad_dtype___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_grad_fn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_imag___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_cpu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_cuda___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_ipu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_leaf___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_maia___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_meta___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mkldnn___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mps___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_mtia___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_nested___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_quantized___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_sparse___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_sparse_csr___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_vulkan___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_xla___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_is_xpu___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_itemsize___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_layout___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_mH___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_mT___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_name___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_names___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_nbytes___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_ndim___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_output_nr___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_real___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_requires_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_retains_grad___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_shape___get__, test/test_overrides.py::TestTorchFunctionOverride::test_TensorBase_volatile___get__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___add__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___and__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___array__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___array_wrap__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___bool__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___complex__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___contains__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___cuda_array_interface_____get__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___deepcopy__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___div__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___dlpack__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___dlpack_device__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___eq__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___float__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___floordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___format__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ge__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___getitem__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___gt__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___iadd__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___iand__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___idiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ifloordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ilshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___imod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___imul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___index__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___int__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___invert__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ior__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___irshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___isub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ixor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___le__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___len__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___long__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___lshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___lt__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___matmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___mod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___mul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ne__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___nonzero__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___or__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___radd__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rand__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rdiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___reduce_ex__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___repr__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___reversed__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rfloordiv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rlshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmatmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmod__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rmul__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___ror__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rpow__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rrshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rshift__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rsub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___rxor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___setitem__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___setstate__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___sub__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___truediv__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor___xor__, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__autocast_to_full_precision, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__autocast_to_reduced_precision, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__clear_non_serializable_cached_data, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__coalesced_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__dimI, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__dimV, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__is_view, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_storage_offsets, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nested_tensor_strides, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__nnz, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__sparse_mask_projection, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__to_dense, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__update_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor__values, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_abs, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_abs_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_absolute, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_absolute_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_acosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addbmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addbmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcdiv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcdiv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcmul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addcmul_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addmv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_addr_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_adjoint, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_align_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_align_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_all, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_allclose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_amax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_amin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_aminmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_angle, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_any, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_apply_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arccosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arcsinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_arctanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argmin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argsort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_argwhere, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_as_strided_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_asinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_atanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_backward, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_baddbmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_baddbmm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bernoulli, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bernoulli_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bfloat16, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bincount, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_and, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_and_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_left_shift, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_left_shift_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_not, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_not_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_or, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_or_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_right_shift, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_right_shift_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_xor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bitwise_xor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_bool, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_broadcast_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_byte, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cauchy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ccol_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cdouble, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ceil, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ceil_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cfloat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_chalf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_char, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cholesky_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_max, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_max_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_min, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clamp_min_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clip, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clip_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_clone, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_coalesce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_col_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj_physical, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_conj_physical_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_contiguous, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copysign, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_copysign_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_corrcoef, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cos, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cos_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cosh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_count_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cov, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cpu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cross, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_crow_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cuda, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cummax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cummin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumprod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumprod_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumsum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_cumsum_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_data_ptr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_deg2rad, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_deg2rad_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dense_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dequantize, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_det, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_detach, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_detach_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diag, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diag_embed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagflat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diagonal_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_diff, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_digamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dim_order, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dist, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_div, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_div_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_double, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_dsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_element_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_eq, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_eq_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erf_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_erfinv_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expand, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expand_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_expm1_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_exponential_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fill_diagonal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fix, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fix_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flatten, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flip, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fliplr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_flipud, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float_power, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_float_power_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_floor_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_fmod_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frac, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frac_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_frexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gather, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gcd, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gcd_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ge, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ge_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_geometric_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_geqrf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ger, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_get_device, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_greater_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_gt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_half, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hardshrink, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_has_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hash_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_heaviside, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_heaviside_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_histc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_histogram, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hypot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_hypot_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_i0, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_i0_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igammac, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_igammac_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_copy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_copy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_fill, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_put, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_put_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_reduce_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_index_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_inner, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_int, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_int_repr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ipu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_coalesced, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_complex, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_contiguous, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_distributed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_floating_point, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_inference, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_pinned, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_same_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_set_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_shared, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_is_signed, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isclose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isfinite, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isinf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isnan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isneginf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isposinf, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_isreal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_istft, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_item, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_kron, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_kthvalue, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lcm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lcm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ldexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ldexp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_le, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_le_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lerp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lerp_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_less_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lgamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lgamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log10, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log10_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log1p_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logaddexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logaddexp2, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logcumsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logdet, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_and, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_and_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_not, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_not_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_or, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_or_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_xor, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logical_xor_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logit_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_long, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_map2_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_map_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_fill, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_fill_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_scatter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_masked_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_max, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_maximum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mean, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_median, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_min, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_minimum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mode, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_module_load, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_moveaxis, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_movedim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_msort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mtia, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mul, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mul_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multinomial, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multiply, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_multiply_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mv, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mvlgamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_mvlgamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nan_to_num, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nan_to_num_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanmean, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanmedian, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nanquantile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nansum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_narrow, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_narrow_copy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ndimension, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ne, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ne_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_neg_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_negative, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_negative_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nelement, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nextafter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nextafter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_nonzero_static, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_norm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_not_equal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_not_equal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_numel, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_numpy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_orgqr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ormqr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_outer, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_permute, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pin_memory, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pinverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_polygamma_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_positive, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pow, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_pow_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_prelu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_prod, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_put, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_put_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_axis, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_scales, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_per_channel_zero_points, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_scale, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_q_zero_point, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_qr, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_qscheme, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_quantile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rad2deg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rad2deg_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_random_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_ravel, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reciprocal, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reciprocal_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_record_stream, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_refine_names, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_register_hook, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_register_post_accumulate_grad_hook, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_relu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_relu_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_remainder, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_remainder_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rename, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rename_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_renorm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_renorm_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_repeat, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_repeat_interleave, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_requires_grad_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reshape, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_reshape_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resize_as_sparse_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resolve_conj, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_resolve_neg, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_retain_grad, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_roll, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rot90, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_round, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_round_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_row_indices, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rsqrt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_rsqrt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_add, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_add_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_scatter_reduce_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_select, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_select_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_set_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sgn, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sgn_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_share_memory_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_short, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sigmoid_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sign, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sign_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_signbit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sin, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sin_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sinh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slice_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slice_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_smm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sort, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_mask, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_resize_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sparse_resize_and_clear_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sqrt_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_square, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_square_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_squeeze, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_squeeze_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sspaddmm, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_std, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_stft, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage_offset, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_storage_type, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sub, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sub_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_subtract, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_subtract_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sum, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_sum_to_size, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_svd, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapaxes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapaxes_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapdims, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_swapdims_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_t, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_t_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_take, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_take_along_dim, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tan, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tan_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tanh_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tile, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_dense, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_mkldnn, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_to_sparse, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tolist, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_topk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trace, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_transpose_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triangular_solve, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tril, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_tril_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_triu_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_true_divide, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_true_divide_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trunc, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_trunc_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_type, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_type_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unbind, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unfold, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unique, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unique_consecutive, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_split, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsafe_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsqueeze, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_unsqueeze_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_untyped_storage, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_values, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_var, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_vdot, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_view, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_view_as, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_vsplit, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_where, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xlogy_, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_xpu, test/test_overrides.py::TestTorchFunctionOverride::test_Tensor_zero_, test/test_overrides.py::TestTorchFunctionOverride::test_base, test/test_overrides.py::TestTorchFunctionOverride::test_dtype_override, test/test_overrides.py::TestTorchFunctionOverride::test_grad, test/test_overrides.py::TestTorchFunctionOverride::test_has_torch_function_non_sequence, test/test_overrides.py::TestTorchFunctionOverride::test_mean_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_mm_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_pow_rpow, test/test_overrides.py::TestTorchFunctionOverride::test_precedence_semantics, test/test_overrides.py::TestTorchFunctionOverride::test_tensor_subclass_propagation, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_fftshift, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_hfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ifftshift, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_ihfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_irfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfft, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfft2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__fft_fft_rfftn, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cholesky_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cond, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_cross, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_det, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eig, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigvals, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_eigvalsh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_householder_product, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_inv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_inv_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_factor, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_factor_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_ldl_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lstsq, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_factor, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_factor_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_matrix_rank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_multi_dot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_pinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_qr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve_ex, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_solve_triangular, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_svd, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_svdvals, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_tensorinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_tensorsolve, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vander, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vecdot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__linalg_linalg_vector_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_avg_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_avg_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_gelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_linear, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_log_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_one_hot, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_scaled_dot_product_attention, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_softplus, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__nn_softshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_airy_ai, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_j0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_j1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_y0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_bessel_y1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_u, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_v, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_chebyshev_polynomial_w, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_entr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erf, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfcx, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_expit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammainc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammaincc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_gammaln, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_hermite_polynomial_h, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_hermite_polynomial_he, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i0e, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_i1e, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_laguerre_polynomial_l, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_legendre_polynomial_p, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log_ndtr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_logit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_i1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_k0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_modified_bessel_k1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_multigammaln, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_ndtr, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_ndtri, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_psi, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_round, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_scaled_modified_bessel_k0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_scaled_modified_bessel_k1, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_u, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_v, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_shifted_chebyshev_polynomial_w, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_spherical_bessel_j0, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_xlog1py, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__C__special_special_zeta, test/test_overrides.py::TestTorchFunctionOverride::test_torch__assert_async, test/test_overrides.py::TestTorchFunctionOverride::test_torch__conj_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__functional_assert_async, test/test_overrides.py::TestTorchFunctionOverride::test_torch__fused_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch__fw_primal_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lobpcg_lobpcg, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lowrank_pca_lowrank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__lowrank_svd_lowrank, test/test_overrides.py::TestTorchFunctionOverride::test_torch__make_dual_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__native_batch_norm_legit, test/test_overrides.py::TestTorchFunctionOverride::test_torch__neg_view_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__reshape_alias_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__rowwise_prune, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sparse_broadcast_to_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_acos, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_asin, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_atan, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_cos, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sin, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_tan, test/test_overrides.py::TestTorchFunctionOverride::test_torch__sym_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch__values_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch__wrapped_linear_prepack, test/test_overrides.py::TestTorchFunctionOverride::test_torch__wrapped_quantized_linear_prepacked, test/test_overrides.py::TestTorchFunctionOverride::test_torch_abs, test/test_overrides.py::TestTorchFunctionOverride::test_torch_absolute, test/test_overrides.py::TestTorchFunctionOverride::test_torch_acos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_acosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adaptive_avg_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adaptive_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addbmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addcdiv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addcmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addmv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_addr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_adjoint, test/test_overrides.py::TestTorchFunctionOverride::test_torch_affine_grid_generator, test/test_overrides.py::TestTorchFunctionOverride::test_torch_alias_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_all, test/test_overrides.py::TestTorchFunctionOverride::test_torch_allclose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_amax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_amin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_aminmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_angle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_any, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arccos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arccosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arcsin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arcsinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctan2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_arctanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argsort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_argwhere, test/test_overrides.py::TestTorchFunctionOverride::test_torch_as_strided_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_as_strided_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_asin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_asinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atan2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_atanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_avg_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_baddbmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_backward_elemt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_backward_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_elemt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_gather_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_gather_stats_with_counts, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_batch_norm_update_stats, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bernoulli, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bilinear, test/test_overrides.py::TestTorchFunctionOverride::test_torch_binary_cross_entropy_with_logits, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bincount, test/test_overrides.py::TestTorchFunctionOverride::test_torch_binomial, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_and, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_left_shift, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_or, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_right_shift, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bitwise_xor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_broadcast_to, test/test_overrides.py::TestTorchFunctionOverride::test_torch_bucketize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ccol_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ceil, test/test_overrides.py::TestTorchFunctionOverride::test_torch_celu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_channel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cholesky_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_choose_qparams_optimized, test/test_overrides.py::TestTorchFunctionOverride::test_torch_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clamp_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clip, test/test_overrides.py::TestTorchFunctionOverride::test_torch_clone, test/test_overrides.py::TestTorchFunctionOverride::test_torch_col_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_column_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_combinations, test/test_overrides.py::TestTorchFunctionOverride::test_torch_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_concat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_concatenate, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conj_physical, test/test_overrides.py::TestTorchFunctionOverride::test_torch_constant_pad_nd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_tbc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_conv_transpose3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_copysign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_corrcoef, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cos, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosine_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cosine_similarity, test/test_overrides.py::TestTorchFunctionOverride::test_torch_count_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cov, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cross, test/test_overrides.py::TestTorchFunctionOverride::test_torch_crow_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ctc_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cummax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cummin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumprod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumsum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_cumulative_trapezoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_deg2rad, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dequantize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_det, test/test_overrides.py::TestTorchFunctionOverride::test_torch_detach, test/test_overrides.py::TestTorchFunctionOverride::test_torch_detach_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diag_embed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagflat, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diagonal_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_diff, test/test_overrides.py::TestTorchFunctionOverride::test_torch_digamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dsmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_dstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_embedding, test/test_overrides.py::TestTorchFunctionOverride::test_torch_embedding_bag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_empty_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_eq, test/test_overrides.py::TestTorchFunctionOverride::test_torch_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erfc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_erfinv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_exp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_expand_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_expm1, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fake_quantize_per_channel_affine, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fake_quantize_per_tensor_affine, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_fp16_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_fp16_weight_fp32_activation, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_int8_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_int8_weight_fp32_activation, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_linear_quantize_weight, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_pack_gemm_matrix_fp16, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fbgemm_pack_quantized_matrix, test/test_overrides.py::TestTorchFunctionOverride::test_torch_feature_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_feature_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fix, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flatten, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flip, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fliplr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_flipud, test/test_overrides.py::TestTorchFunctionOverride::test_torch_float_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch_floor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_floor_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fmod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frac, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_frobenius_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_full_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_empty_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_float_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_in_scalar_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_mixed_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_nested_tuple_getitem, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_not_first_in_list, test/test_overrides.py::TestTorchFunctionOverride::test_torch_function_precedence_in_lists, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_atleast_3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_block_diag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_broadcast_tensors, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_cartesian_prod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_cdist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_chain_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_einsum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_lu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_meshgrid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_stft, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_tensordot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unique, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unique_consecutive, test/test_overrides.py::TestTorchFunctionOverride::test_torch_functional_unravel_index, test/test_overrides.py::TestTorchFunctionOverride::test_torch_fused_moving_avg_obs_fake_quant, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gather, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gcd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ge, test/test_overrides.py::TestTorchFunctionOverride::test_torch_geqrf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ger, test/test_overrides.py::TestTorchFunctionOverride::test_torch_get_device, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gradient, test/test_overrides.py::TestTorchFunctionOverride::test_torch_greater, test/test_overrides.py::TestTorchFunctionOverride::test_torch_greater_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler_2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_grid_sampler_3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gru, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gru_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_gt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hardshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hash_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_heaviside, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hinge_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histogram, test/test_overrides.py::TestTorchFunctionOverride::test_torch_histogramdd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hsmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_hypot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_i0, test/test_overrides.py::TestTorchFunctionOverride::test_torch_igamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_igammac, test/test_overrides.py::TestTorchFunctionOverride::test_torch_imag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_fill, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_put, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_index_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_inner, test/test_overrides.py::TestTorchFunctionOverride::test_torch_instance_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_int_repr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_distributed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_floating_point, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_inference, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_same_size, test/test_overrides.py::TestTorchFunctionOverride::test_torch_is_signed, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isclose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isfinite, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isinf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isnan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isneginf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isposinf, test/test_overrides.py::TestTorchFunctionOverride::test_torch_isreal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_istft, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kl_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kron, test/test_overrides.py::TestTorchFunctionOverride::test_torch_kthvalue, test/test_overrides.py::TestTorchFunctionOverride::test_torch_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lcm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ldexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_le, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lerp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_less, test/test_overrides.py::TestTorchFunctionOverride::test_torch_less_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lgamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log10, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log1p, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logaddexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logaddexp2, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logcumsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_and, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_or, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logical_xor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_logsumexp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lstm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lstm_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lu_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_lu_unpack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_margin_ranking_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_fill, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_masked_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matmul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matrix_exp, test/test_overrides.py::TestTorchFunctionOverride::test_torch_matrix_power, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_maximum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_median, test/test_overrides.py::TestTorchFunctionOverride::test_torch_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_minimum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_add_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_convolution_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_depthwise_convolution, test/test_overrides.py::TestTorchFunctionOverride::test_torch_miopen_rnn, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mode, test/test_overrides.py::TestTorchFunctionOverride::test_torch_moveaxis, test/test_overrides.py::TestTorchFunctionOverride::test_torch_movedim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_msort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mul, test/test_overrides.py::TestTorchFunctionOverride::test_torch_multinomial, test/test_overrides.py::TestTorchFunctionOverride::test_torch_multiply, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mv, test/test_overrides.py::TestTorchFunctionOverride::test_torch_mvlgamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nan_to_num, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanmean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanmedian, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nanquantile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nansum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_narrow, test/test_overrides.py::TestTorchFunctionOverride::test_torch_narrow_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_channel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_native_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ne, test/test_overrides.py::TestTorchFunctionOverride::test_torch_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_negative, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nextafter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional__threshold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_avg_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_avg_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_adaptive_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_affine_grid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_binary_cross_entropy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_binary_cross_entropy_with_logits, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_celu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_cosine_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_cross_entropy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_ctc_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_dropout3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_elu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_embedding, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_embedding_bag, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_feature_alpha_dropout, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_fractional_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_gaussian_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_glu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_grid_sample, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_group_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_gumbel_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_hardtanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_hinge_embedding_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_huber_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_instance_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_interpolate, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_kl_div, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_l1_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_layer_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_leaky_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_local_response_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_log_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_lp_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_margin_ranking_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool1d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool2d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_pool3d_with_indices, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_max_unpool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_mish, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_mse_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multi_head_attention_forward, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multi_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multilabel_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_multilabel_soft_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_normalize, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_pad, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_poisson_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_relu6, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_rrelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_selu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_silu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_smooth_l1_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_soft_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softmin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_softsign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_tanhshrink, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_triplet_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_triplet_margin_with_distance_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_functional_unfold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_constant_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_kaiming_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_normal_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nn_init_uniform_, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nonzero, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nonzero_static, test/test_overrides.py::TestTorchFunctionOverride::test_torch_norm_except_dim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_not_equal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_nuclear_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_numel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ones_like, test/test_overrides.py::TestTorchFunctionOverride::test_torch_orgqr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ormqr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_outer, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pairwise_distance, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pdist, test/test_overrides.py::TestTorchFunctionOverride::test_torch_permute, test/test_overrides.py::TestTorchFunctionOverride::test_torch_permute_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pinverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pixel_shuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pixel_unshuffle, test/test_overrides.py::TestTorchFunctionOverride::test_torch_poisson, test/test_overrides.py::TestTorchFunctionOverride::test_torch_poisson_nll_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_polar, test/test_overrides.py::TestTorchFunctionOverride::test_torch_polygamma, test/test_overrides.py::TestTorchFunctionOverride::test_torch_positive, test/test_overrides.py::TestTorchFunctionOverride::test_torch_pow, test/test_overrides.py::TestTorchFunctionOverride::test_torch_prelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_prod, test/test_overrides.py::TestTorchFunctionOverride::test_torch_put, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_axis, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_scales, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_per_channel_zero_points, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_scale, test/test_overrides.py::TestTorchFunctionOverride::test_torch_q_zero_point, test/test_overrides.py::TestTorchFunctionOverride::test_torch_qr, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_channel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_tensor, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantize_per_tensor_dynamic, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_batch_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_gru_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_lstm_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool1d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool2d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_max_pool3d, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_rnn_relu_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_quantized_rnn_tanh_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rad2deg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_ravel, test/test_overrides.py::TestTorchFunctionOverride::test_torch_real, test/test_overrides.py::TestTorchFunctionOverride::test_torch_reciprocal, test/test_overrides.py::TestTorchFunctionOverride::test_torch_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_remainder, test/test_overrides.py::TestTorchFunctionOverride::test_torch_renorm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_repeat_interleave, test/test_overrides.py::TestTorchFunctionOverride::test_torch_reshape, test/test_overrides.py::TestTorchFunctionOverride::test_torch_resolve_conj, test/test_overrides.py::TestTorchFunctionOverride::test_torch_resolve_neg, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rms_norm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_relu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_relu_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rnn_tanh_cell, test/test_overrides.py::TestTorchFunctionOverride::test_torch_roll, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rot90, test/test_overrides.py::TestTorchFunctionOverride::test_torch_round, test/test_overrides.py::TestTorchFunctionOverride::test_torch_row_indices_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_row_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rrelu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rsqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_rsub, test/test_overrides.py::TestTorchFunctionOverride::test_torch_saddmm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter_add, test/test_overrides.py::TestTorchFunctionOverride::test_torch_scatter_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_searchsorted, test/test_overrides.py::TestTorchFunctionOverride::test_torch_segment_reduce, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_select_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_selu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sgn, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sigmoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sign, test/test_overrides.py::TestTorchFunctionOverride::test_torch_signbit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sin, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sinc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sinh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_inverse, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slice_scatter, test/test_overrides.py::TestTorchFunctionOverride::test_torch_slogdet, test/test_overrides.py::TestTorchFunctionOverride::test_torch_smm, test/test_overrides.py::TestTorchFunctionOverride::test_torch_softmax, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sort, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_split_with_sizes_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sqrt, test/test_overrides.py::TestTorchFunctionOverride::test_torch_square, test/test_overrides.py::TestTorchFunctionOverride::test_torch_squeeze, test/test_overrides.py::TestTorchFunctionOverride::test_torch_squeeze_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_stack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_std, test/test_overrides.py::TestTorchFunctionOverride::test_torch_std_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sub, test/test_overrides.py::TestTorchFunctionOverride::test_torch_subtract, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_svd, test/test_overrides.py::TestTorchFunctionOverride::test_torch_swapaxes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_swapdims, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_float, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_int, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_ite, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_max, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_min, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_not, test/test_overrides.py::TestTorchFunctionOverride::test_torch_sym_sum, test/test_overrides.py::TestTorchFunctionOverride::test_torch_t, test/test_overrides.py::TestTorchFunctionOverride::test_torch_t_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_take, test/test_overrides.py::TestTorchFunctionOverride::test_torch_take_along_dim, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tan, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tanh, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tensor_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_threshold, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tile, test/test_overrides.py::TestTorchFunctionOverride::test_torch_topk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trace, test/test_overrides.py::TestTorchFunctionOverride::test_torch_transpose, test/test_overrides.py::TestTorchFunctionOverride::test_torch_transpose_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trapezoid, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trapz, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triangular_solve, test/test_overrides.py::TestTorchFunctionOverride::test_torch_tril, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triplet_margin_loss, test/test_overrides.py::TestTorchFunctionOverride::test_torch_triu, test/test_overrides.py::TestTorchFunctionOverride::test_torch_true_divide, test/test_overrides.py::TestTorchFunctionOverride::test_torch_trunc, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unbind, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unbind_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unflatten, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unfold_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_chunk, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_split, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsafe_split_with_sizes, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsqueeze, test/test_overrides.py::TestTorchFunctionOverride::test_torch_unsqueeze_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_values_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_var, test/test_overrides.py::TestTorchFunctionOverride::test_torch_var_mean, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vdot, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_complex, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_complex_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_real, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_as_real_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_view_copy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vsplit, test/test_overrides.py::TestTorchFunctionOverride::test_torch_vstack, test/test_overrides.py::TestTorchFunctionOverride::test_torch_where, test/test_overrides.py::TestTorchFunctionOverride::test_torch_xlogy, test/test_overrides.py::TestTorchFunctionOverride::test_torch_zeros_like, test/test_overrides.py::TestTorchFunctionOverride::test_user_implementation_raises, test/test_overrides.py::TestEinsumOverride::test_wrapper, test/test_overrides.py::TestGradCheckOverride::test_gradcheck, test/test_overrides.py::TestNamedTuple::test_max, test/test_overrides.py::TestGradNewOnesOverride::test_newones, test/test_overrides.py::TestPickle::test_pickle, test/test_overrides.py::TestBroadcastAllOverride::test_broadcast_all, test/test_overrides.py::TestWrapTorchFunction::test_wrap_torch_function, test/test_overrides.py::TestIndexing::test_getitem, test/test_overrides.py::TestIndexing::test_getitem_subclass, test/test_overrides.py::TestIndexing::test_setitem, test/test_overrides.py::TestIndexing::test_setitem_subclass, test/test_overrides.py::TestIndexing::test_setitem_val, test/test_overrides.py::TestIterator::test_iterator, test/test_overrides.py::TestRNN::test_rnn, test/test_overrides.py::TestDisabledTorchFunction::test_parameter_does_not_prevent_dispatch, test/test_overrides.py::TestResolveName::test_resolve_name, test/test_overrides.py::TestTorchFunctionWarning::test_torch_function_standalone_class, test/test_overrides.py::TestTorchFunctionWarning::test_torch_function_tensor_subclass, test/test_overrides.py::TestDisabledUserWarnings::test_no_implicit_user_warning_for_deprecated_functions, test/test_overrides.py::TestTorchFunctionMode::test_all_same_mode, test/test_overrides.py::TestTorchFunctionMode::test_basic, test/test_overrides.py::TestTorchFunctionMode::test_custom_device_type, test/test_overrides.py::TestTorchFunctionMode::test_device_context_semantics, test/test_overrides.py::TestTorchFunctionMode::test_disable_enable_subclass, test/test_overrides.py::TestTorchFunctionMode::test_disable_enable_torch_function_ctx, test/test_overrides.py::TestTorchFunctionMode::test_disable_subclass_mode, test/test_overrides.py::TestTorchFunctionMode::test_disable_subclass_not_mode, test/test_overrides.py::TestTorchFunctionMode::test_distributions_bernoulli, test/test_overrides.py::TestTorchFunctionMode::test_error_using_class_method_on_mode, test/test_overrides.py::TestTorchFunctionMode::test_factory_override, test/test_overrides.py::TestTorchFunctionMode::test_get_cur_mode, test/test_overrides.py::TestTorchFunctionMode::test_get_mode_stack, test/test_overrides.py::TestTorchFunctionMode::test_getitem_call, test/test_overrides.py::TestTorchFunctionMode::test_mode_notimplemented_loop, test/test_overrides.py::TestTorchFunctionMode::test_modes_handle_first, test/test_overrides.py::TestTorchFunctionMode::test_modes_return_notimplemented, test/test_overrides.py::TestTorchFunctionMode::test_nested_modes_with_python_has_torch_function, test/test_overrides.py::TestTorchFunctionMode::test_nested_same_mode, test/test_overrides.py::TestTorchFunctionMode::test_nn_parse_to, test/test_overrides.py::TestTorchFunctionMode::test_reentrant_mode_idiom, test/test_overrides.py::TestTorchFunctionMode::test_restacking_with_ancestor, test/test_overrides.py::TestTorchFunctionMode::test_subclass_hash, test/test_overrides.py::TestTorchFunctionMode::test_torch_function_all_disabled_api, test/test_overrides.py::TestTorchFunctionMode::test_with_mode, test/test_overrides.py::TestTorchFunctionMode::test_with_mode_created_separately, test/test_overrides.py::TestTorchFunctionMode::test_with_nested_modes
2025-12-04T14:35:35.9843532Z 
2025-12-04T14:35:35.9843754Z Finished test_overrides 1/1 ... [2025-12-04 14:35:35.921063][20575.849356173], took 0.11min
2025-12-04T14:35:35.9844446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_overrides/test_overrides-54542bcdcb986158.xml
2025-12-04T14:35:36.0748818Z Running torch_np/test_function_base 1/1 ... [2025-12-04 14:35:36.074636][20576.002933099]
2025-12-04T14:35:36.0749274Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:36.0752275Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_function_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:36.074949]
2025-12-04T14:35:39.3453708Z 
2025-12-04T14:35:39.3454663Z torch_np/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_function_base_1.1_500584d725d1f2e7_.log
2025-12-04T14:35:39.3455875Z Running 2 items in this shard: test/torch_np/test_function_base.py::TestAppend::test_basic, test/torch_np/test_function_base.py::TestMisc::test_broadcast_shapes
2025-12-04T14:35:39.3456463Z 
2025-12-04T14:35:39.3456754Z Finished torch_np/test_function_base 1/1 ... [2025-12-04 14:35:39.345005][20579.273296317], took 0.05min
2025-12-04T14:35:39.3841406Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_function_base/torch_np.test_function_base-75280ab4c9ebfe9c.xml
2025-12-04T14:35:39.4349197Z Running test_type_promotion 1/1 ... [2025-12-04 14:35:39.434665][20579.362963488]
2025-12-04T14:35:39.4349657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:39.4352284Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_type_promotion.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:39.434965]
2025-12-04T14:35:49.1653450Z 
2025-12-04T14:35:49.1654742Z test_type_promotion 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_type_promotion_1.1_6db001c785be877d_.log
2025-12-04T14:35:49.1773544Z Running 423 items in this shard: test/test_type_promotion.py::TestTypePromotionCUDA::test_add_wrapped_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alpha_mismatch_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_alternate_result_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_bfloat16_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_booleans_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_can_cast_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_cat_out_different_dtypes_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_bool_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_float64_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_clamp_type_promotion_cuda_int32_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_comparison_ops_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_assertraises_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_complex_scalar_mult_tensor_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_computation_ignores_out_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_create_bool_tensors_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_inplace_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_div_promotion_out_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_float_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_from_issue_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_half_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_indexing_fail_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_inplace_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_int_to_float_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_integer_addcdiv_deprecated_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_lt_with_type_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_many_promotions_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_mixed_type_backward_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_non_promoting_ops_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_numpy_array_binary_ufunc_promotion_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_self_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_promote_types_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bfloat16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_bool_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex128_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_complex64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_float64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int16_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int32_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int64_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_int8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bfloat16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_int8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_cuda_uint8_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_result_type_tensor_vs_scalar_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_add_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_bool, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int16, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int32, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_div_promotion_cuda_uint8, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_mul_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_sparse_sub_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_ternary_out_promotion_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_transpose_cuda, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex128_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_complex64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float32_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_float64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex128, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_complex64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float32, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_float64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unary_op_out_casting_cuda_int64_int64, test/test_type_promotion.py::TestTypePromotionCUDA::test_unsigned_cuda
2025-12-04T14:35:49.1886486Z 
2025-12-04T14:35:49.1886708Z Finished test_type_promotion 1/1 ... [2025-12-04 14:35:49.165786][20589.094079301], took 0.16min
2025-12-04T14:35:49.2060350Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_type_promotion/test_type_promotion-263da3bcb01bdba5.xml
2025-12-04T14:35:49.3134386Z Running torch_np/test_scalars_0D_arrays 1/1 ... [2025-12-04 14:35:49.313186][20589.241483944]
2025-12-04T14:35:49.3134883Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:49.3137728Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_scalars_0D_arrays.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:49.313510]
2025-12-04T14:35:52.6357587Z 
2025-12-04T14:35:52.6358483Z torch_np/test_scalars_0D_arrays 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_scalars_0D_arrays_1.1_ed0b8e715c6c1568_.log
2025-12-04T14:35:52.6368438Z Running 33 items in this shard: test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_array, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_asarray, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_asarray_int, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_array_scalar_basic_int64, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_array, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_asarray, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_asarray_int, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_conversion_to_int_int64, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_array, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_asarray, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_asarray_int, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_decay_to_py_scalar_int64, test/torch_np/test_scalars_0D_arrays.py::TestArrayScalars::test_scalar_comparisons, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value0, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value1, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value10, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value11, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value4, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value5, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value6, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value7, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value8, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value9, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value_s, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_not_scalar_value_string, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_array_0D, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_array_1D, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_array_2D, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_float32, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_int, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_list, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_list-list, test/torch_np/test_scalars_0D_arrays.py::TestIsScalar::test_is_scalar_literal
2025-12-04T14:35:52.6376055Z 
2025-12-04T14:35:52.6376319Z Finished torch_np/test_scalars_0D_arrays 1/1 ... [2025-12-04 14:35:52.635505][20592.563796598], took 0.06min
2025-12-04T14:35:52.6749716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_scalars_0D_arrays/torch_np.test_scalars_0D_arrays-0d870852e9b8ecce.xml
2025-12-04T14:35:52.7127394Z Running test_cuda_primary_ctx 1/1 ... [2025-12-04 14:35:52.712514][20592.64081291]
2025-12-04T14:35:52.7127821Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:35:52.7130704Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_primary_ctx.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:52.712815]
2025-12-04T14:36:08.9167130Z 
2025-12-04T14:36:08.9167967Z test_cuda_primary_ctx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_primary_ctx_1.1_e42f702cb40c5a59_.log
2025-12-04T14:36:08.9169802Z Running 4 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_copy, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_pin_memory, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_set_device_0, test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_str_repr
2025-12-04T14:36:08.9171128Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_copy
2025-12-04T14:36:08.9171635Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_pin_memory
2025-12-04T14:36:08.9172382Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_set_device_0
2025-12-04T14:36:08.9172903Z Running 1 items in this shard: test/test_cuda_primary_ctx.py::TestCudaPrimaryCtx::test_str_repr
2025-12-04T14:36:08.9173200Z 
2025-12-04T14:36:08.9173412Z Finished test_cuda_primary_ctx 1/1 ... [2025-12-04 14:36:08.916597][20608.844892977], took 0.27min
2025-12-04T14:36:08.9567526Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-ed390b613a87fd4b.xml
2025-12-04T14:36:09.0441194Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-30e4b3748ee506f0.xml
2025-12-04T14:36:09.0800156Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-a99d9031384e91f6.xml
2025-12-04T14:36:09.1227254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-f9602eb3757a7925.xml
2025-12-04T14:36:09.1561107Z Running profiler/test_profiler_tree 1/1 ... [2025-12-04 14:36:09.155852][20609.084151081]
2025-12-04T14:36:09.1561751Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:36:09.1565034Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_profiler_tree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:09.156206]
2025-12-04T14:36:12.6789164Z 
2025-12-04T14:36:12.6801256Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_8c9f11eb1f1e482c_.log
2025-12-04T14:36:12.6808201Z Running 10 items in this shard: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_detailed, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_with_stream, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory_and_stack, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_record_function, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_dispatch, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_function
2025-12-04T14:36:12.6814346Z 
2025-12-04T14:36:12.6814771Z Finished profiler/test_profiler_tree 1/1 ... [2025-12-04 14:36:12.678603][20612.606895033], took 0.06min
2025-12-04T14:36:12.7194209Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_profiler_tree/profiler.test_profiler_tree-3ad20a65a3b0a20b.xml
2025-12-04T14:36:12.7664243Z Running torch_np/numpy_tests/lib/test_arraysetops 1/1 ... [2025-12-04 14:36:12.766166][20612.694465281]
2025-12-04T14:36:12.7664996Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:36:12.7667863Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_arraysetops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:12.766486]
2025-12-04T14:36:16.2880064Z 
2025-12-04T14:36:16.2881059Z torch_np/numpy_tests/lib/test_arraysetops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_arraysetops_1.1_330f28fa6e8cce7d_.log
2025-12-04T14:36:16.2898772Z Running 62 items in this shard: test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary0_prepend0_append_nan_expected_to_end, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary1_prepend1_append1_expected_to_begin, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary2_prepend_nan_append_nan_expected_to_begin, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary0_prepend_65536_append_65540_expected0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary1_prepend1_append1_expected1, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary2_prepend_0_append_0_expected2, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary3_prepend_3_append_-9_expected3, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_both_arrays_are_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_both_arrays_have_structured_dtype, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_char_array, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_errors, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_first_array_is_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_hit_alternate_algorithm, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_second_array_is_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_table_timedelta_fails, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_timedelta_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_timedelta_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_with_arrays_containing_tuples, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d_array_like, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d_indices, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_manyways, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d_char_array, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d_unique, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setxor1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_union1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_2, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_with_axis_axis_-1, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_with_axis_axis_0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_errors, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_list, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_zeros, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_nanequals, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_sort_order_with_axis
2025-12-04T14:36:16.2915183Z 
2025-12-04T14:36:16.2915540Z Finished torch_np/numpy_tests/lib/test_arraysetops 1/1 ... [2025-12-04 14:36:16.287798][20616.216088967], took 0.06min
2025-12-04T14:36:16.3278766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraysetops/torch_np.numpy_tests.lib.test_arraysetops-0f1d2c69ab44ce0e.xml
2025-12-04T14:36:16.3580868Z Running test_dlpack 1/1 ... [2025-12-04 14:36:16.357862][20616.286161747]
2025-12-04T14:36:16.3581272Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:36:16.3583839Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_dlpack.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:16.358145]
2025-12-04T14:36:20.2812206Z 
2025-12-04T14:36:20.2814228Z test_dlpack 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_dlpack_1.1_1ec0373c5e8d3363_.log
2025-12-04T14:36:20.2860710Z Running 154 items in this shard: test/test_dlpack.py::TestTorchDlPackCUDA::test_automatically_select_in_creation_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_copy_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_capsule_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float4_e2m1fn_x2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e4m3fn, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e4m3fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e5m2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e5m2fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_diff_streams_narrow_precision_cuda_float8_e8m0fnu, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float4_e2m1fn_x2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e4m3fn, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e4m3fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e5m2, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e5m2fnuz, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_conversion_with_streams_narrow_precision_cuda_float8_e8m0fnu, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_convert_default_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_cuda_per_thread_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_default_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_is_conj_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_non_strided_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_export_requires_grad_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_cpu_stream_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_cuda_streams_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_invalid_rocm_streams_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_normalize_strides_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_protocol_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_shared_storage_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_invalid_stream_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_tensor_on_different_device_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_dlpack_unsupported_dtype_error_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_dtype_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_bfloat16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_bool, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_from_dlpack_noncontinguous_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_max_version_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_needs_copy_error_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_no_copy_cuda, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_complex128, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_complex64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_float64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_int8, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint16, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint32, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint64, test/test_dlpack.py::TestTorchDlPackCUDA::test_numpy_dlpack_protocol_conversion_cuda_uint8, test/test_dlpack.py::TestTorchDlPackCUDA::test_unsupported_device_error_cuda
2025-12-04T14:36:20.2898097Z 
2025-12-04T14:36:20.2898293Z Finished test_dlpack 1/1 ... [2025-12-04 14:36:20.281218][20620.209510858], took 0.07min
2025-12-04T14:36:20.3210617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_dlpack/test_dlpack-41fa7f929a572602.xml
2025-12-04T14:36:20.3545346Z Running profiler/test_torch_tidy 1/1 ... [2025-12-04 14:36:20.354287][20620.282586988]
2025-12-04T14:36:20.3545865Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:36:20.3548203Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_torch_tidy.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:20.354561]
2025-12-04T14:36:26.7803206Z 
2025-12-04T14:36:26.7804714Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_59c0964fd7f7a67f_.log
2025-12-04T14:36:26.7815711Z Running 22 items in this shard: test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_id_uniqueness, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids_with_other_ops, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocations, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_extra_fields, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_impl_reuse, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_mkldnn_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_module_and_optimizer_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_nnmodule_params, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_adam, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_sgd, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_pointers_and_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_refcounts, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_scalar_ins, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_sparse_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_lists, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_properties, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_full, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_keep_alive, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_scalar_args, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_set
2025-12-04T14:36:26.7821644Z 
2025-12-04T14:36:26.7821879Z Finished profiler/test_torch_tidy 1/1 ... [2025-12-04 14:36:26.779926][20626.708218852], took 0.11min
2025-12-04T14:36:26.8201097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_torch_tidy/profiler.test_torch_tidy-6d6836cfdd083f06.xml
2025-12-04T14:36:26.8992610Z Running lazy/test_reuse_ir 1/1 ... [2025-12-04 14:36:26.899020][20626.82731895]
2025-12-04T14:36:26.8993040Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:36:26.8995792Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_reuse_ir.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:26.899318]
2025-12-04T14:36:30.3700886Z 
2025-12-04T14:36:30.3701700Z lazy/test_reuse_ir 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_reuse_ir_1.1_27a380ba8ffdf251_.log
2025-12-04T14:36:30.3703551Z Running 4 items in this shard: test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAdd, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAddSub, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testAddSubFallback, test/lazy/test_reuse_ir.py::TestLazyReuseIr::testBatchNorm
2025-12-04T14:36:30.3704536Z 
2025-12-04T14:36:30.3704778Z Finished lazy/test_reuse_ir 1/1 ... [2025-12-04 14:36:30.369828][20630.298126773], took 0.06min
2025-12-04T14:36:30.4088597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_reuse_ir/lazy.test_reuse_ir-927d5809a72b7fa4.xml
2025-12-04T14:36:30.4425714Z Running test_functional_autograd_benchmark 1/1 ... [2025-12-04 14:36:30.442344][20630.370643435]
2025-12-04T14:36:30.4426212Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:36:30.4429014Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_functional_autograd_benchmark.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:30.442621]
2025-12-04T14:36:52.5447429Z 
2025-12-04T14:36:52.5448429Z test_functional_autograd_benchmark 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_functional_autograd_benchmark_1.1_a799b1a1c12db472_.log
2025-12-04T14:36:52.5449935Z Running 2 items in this shard: test/test_functional_autograd_benchmark.py::TestFunctionalAutogradBenchmark::test_fast_tasks, test/test_functional_autograd_benchmark.py::TestFunctionalAutogradBenchmark::test_slow_tasks
2025-12-04T14:36:52.5450759Z 
2025-12-04T14:36:52.5451115Z Finished test_functional_autograd_benchmark 1/1 ... [2025-12-04 14:36:52.544519][20652.472814605], took 0.37min
2025-12-04T14:36:52.5852109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_functional_autograd_benchmark/test_functional_autograd_benchmark-0542bd395ed50334.xml
2025-12-04T14:36:52.6641421Z Running test_reductions 1/1 ... [2025-12-04 14:36:52.663852][20652.592150346]
2025-12-04T14:36:52.6641908Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:36:52.6644420Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:52.664182]
2025-12-04T14:39:04.4946292Z 
2025-12-04T14:39:04.4947115Z test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_reductions_1.1_99f27656cb9f6489_.log
2025-12-04T14:39:04.6174704Z Running 4759 items in this shard: test/test_reductions.py::TestReductionsCUDA::test_accreal_type_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_all_any_vs_numpy_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_all_any_with_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_all_issue117215_cuda, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_amin_amax_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_aminmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_axis_with_dim_one_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_large_axis_cuda, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_argminmax_multiple_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_bincount_cuda, test/test_reductions.py::TestReductionsCUDA::test_bucketization_cuda, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_cumprod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_cumsum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_arg_reduction_scalar_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_default_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_empty_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_duplicate_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsorted_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_unsupported_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_multi_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_ndim_limit_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_none_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_offbounds_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_max_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mean_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_median_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_min_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_nanmedian_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_norm_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_fns_fn_name_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_lastdim_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_dim_reduction_less_than_64_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_keepdim_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_dim_single_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_empty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice__refs_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_any_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_count_nonzero_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_hash_tensor_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_linalg_vector_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_amin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmax_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_argmin_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_logsumexp_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_norm_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_masked_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nanmean_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_nansum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_prod_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_std_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_sum_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_empty_tensor_nonempty_slice_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_cuda, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_histc_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_histc_min_max_errors_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histc_value_corner_cases_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_histogram_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogram_error_handling_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_histogramdd_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_identity_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_invalid_0dim_aminmax_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logcumsumexp_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_logsumexp_integral_promotion_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_max_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_max_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mean_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_int_with_optdtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mean_out_is_alias_of_return_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_corner_cases_cuda, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_nan_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_median_real_values_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_min_elementwise_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_max_nan_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_mixed_devices_cuda, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_min_with_inf_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_minmax_illegal_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_boolean_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_mode_large_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_device_cuda, test/test_reductions.py::TestReductionsCUDA::test_mode_wrong_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_omit_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nan_policy_propagate_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nanmean_integral_types_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_nansum_complex_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_nansum_out_dtype_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_nansum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_all_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_expanded_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_innermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_outermost_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_noncontiguous_transposed_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_numpy_named_args_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_bool_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_prod_gpu_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_prod_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_prod_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_quantile_backward_cuda, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_quantile_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_quantile_error_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduce_dtype_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_empty_any_all_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_split_cuda, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_input_corner_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reduction_vectorize_along_output_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_reductions_large_half_tensors_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_duplicate_values_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_extremal_values_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_1D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_2D_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_large_input_64bit_indexing_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_scalar_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_ref_small_input_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_repeated_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype__refs_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_all_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_any_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_count_nonzero_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_hash_tensor_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_linalg_vector_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_amin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmax_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_argmin_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_logsumexp_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_norm_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_std_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_masked_var_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_mean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nanmean_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_nansum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_prod_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_std_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_sum_cuda_uint8, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_result_dtype_var_unbiased_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_as_dim_argument_cuda, test/test_reductions.py::TestReductionsCUDA::test_scalar_tensor_dim_compiled_mode_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_std_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_std_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_bool, test/test_reductions.py::TestReductionsCUDA::test_sum_all_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_cpu_device_mismatch_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_dim_reduction_uint8_overflow_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_integer_upcast_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_bfloat16, test/test_reductions.py::TestReductionsCUDA::test_sum_noncontig_lowp_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_out_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_parallel_cuda, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int16, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int32, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int64, test/test_reductions.py::TestReductionsCUDA::test_sum_vs_numpy_cuda_int8, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_argmax_argmix_kthvalue_dim_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_compare_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_tensor_reduce_ops_empty_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_correction_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_dim_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_large_input_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_all_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_mean_correction_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_var_mean_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_mean_some_dims_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability2_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_stability_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_unbiased_cuda, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_var_vs_numpy_cuda_float64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex128, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_complex64, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float32, test/test_reductions.py::TestReductionsCUDA::test_warn_invalid_degrees_of_freedom_cuda_float64
2025-12-04T14:39:04.7356999Z 
2025-12-04T14:39:04.7357244Z Finished test_reductions 1/1 ... [2025-12-04 14:39:04.500020][20784.428313164], took 2.20min
2025-12-04T14:39:04.7357955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_reductions/test_reductions-113ccd6215a90199.xml
2025-12-04T14:39:04.7358619Z Running test_autoload_enable 1/1 ... [2025-12-04 14:39:04.681408][20784.609703684]
2025-12-04T14:39:04.9835879Z Processing /var/lib/jenkins/workspace/test/cpp_extensions
2025-12-04T14:39:07.7525011Z   Preparing metadata (pyproject.toml) ... [?25l- done
2025-12-04T14:39:07.7547279Z [?25hBuilding wheels for collected packages: torch_test_cpp_extension
2025-12-04T14:40:22.1322983Z   Building wheel for torch_test_cpp_extension (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - done
2025-12-04T14:40:22.1441139Z [?25h  Created wheel for torch_test_cpp_extension: filename=torch_test_cpp_extension-0.0.0-cp310-cp310-linux_x86_64.whl size=13199649 sha256=de4334e69daa3b4e72f7ac2e1519bda7cdc82a85fc69a74402cb435a2aedcc2f
2025-12-04T14:40:22.1444525Z   Stored in directory: /tmp/pip-ephem-wheel-cache-tn7_57g2/wheels/2b/79/8d/635cf291e138cfea331292ca746c62b61fade208eb55a7e3a1
2025-12-04T14:40:22.1461023Z Successfully built torch_test_cpp_extension
2025-12-04T14:40:22.6045785Z Installing collected packages: torch_test_cpp_extension
2025-12-04T14:40:22.7814637Z Successfully installed torch_test_cpp_extension-0.0.0
2025-12-04T14:40:25.0833107Z 
2025-12-04T14:40:25.0833528Z Running tests...
2025-12-04T14:40:25.0833883Z ----------------------------------------------------------------------
2025-12-04T14:40:25.3810920Z .
2025-12-04T14:40:25.3811294Z ----------------------------------------------------------------------
2025-12-04T14:40:25.3811678Z Ran 1 test in 0.298s
2025-12-04T14:40:25.3811828Z 
2025-12-04T14:40:25.3811902Z OK
2025-12-04T14:40:25.3812004Z 
2025-12-04T14:40:25.3812103Z Generating XML reports...
2025-12-04T14:40:25.9521453Z Finished test_autoload_enable 1/1 ... [2025-12-04 14:40:25.951777][20865.880063863], took 1.35min
2025-12-04T14:40:25.9922743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204144025.xml
2025-12-04T14:40:29.3754817Z Running test batch 'tests to run' cost 20012.38 seconds
2025-12-04T14:40:29.3766857Z Emitting td_test_failure_stats_v2
2025-12-04T14:40:29.3770426Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2df5eaaad11f11f0bade0242ac110002
2025-12-04T14:40:29.4935226Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2df5eaaad11f11f0bade0242ac110002 
2025-12-04T14:40:29.4945576Z Emitting td_test_failure_stats_v2
2025-12-04T14:40:29.4947714Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2e07e282d11f11f0bade0242ac110002
2025-12-04T14:40:29.5265362Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764859229_2e07e282d11f11f0bade0242ac110002 
2025-12-04T14:40:29.5266605Z inductor/test_cuda_select_algorithm 1/1 failed!
2025-12-04T14:40:29.5267124Z inductor/test_fp8 1/1 failed!
2025-12-04T14:40:30.1188880Z 
2025-12-04T14:40:30.1189327Z real	333m37.579s
2025-12-04T14:40:30.1189720Z user	320m38.892s
2025-12-04T14:40:30.1190063Z sys	40m58.521s
2025-12-04T14:40:30.1190749Z + sccache_epilogue
2025-12-04T14:40:30.1191070Z + echo '::group::Sccache Compilation Log'
2025-12-04T14:40:30.1191688Z ##[group]Sccache Compilation Log
2025-12-04T14:40:30.1192020Z + echo '=================== sccache compilation log ==================='
2025-12-04T14:40:30.1192403Z =================== sccache compilation log ===================
2025-12-04T14:40:30.1192972Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log
2025-12-04T14:40:30.1312919Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
2025-12-04T14:40:30.1313989Z =========== If your build fails, please take a look at the log above for possible reasons ===========
2025-12-04T14:40:30.1314470Z + sccache --show-stats
2025-12-04T14:40:30.1338353Z Compile requests                   4236
2025-12-04T14:40:30.1338774Z Compile requests executed           556
2025-12-04T14:40:30.1339143Z Cache hits                          281
2025-12-04T14:40:30.1339518Z Cache hits (C/C++)                  281
2025-12-04T14:40:30.1339751Z Cache misses                        273
2025-12-04T14:40:30.1339964Z Cache misses (C/C++)                273
2025-12-04T14:40:30.1340191Z Cache hits rate                   50.72 %
2025-12-04T14:40:30.1340415Z Cache hits rate (C/C++)           50.72 %
2025-12-04T14:40:30.1340624Z Cache timeouts                        0
2025-12-04T14:40:30.1340841Z Cache read errors                     0
2025-12-04T14:40:30.1341048Z Forced recaches                       0
2025-12-04T14:40:30.1341251Z Cache write errors                    0
2025-12-04T14:40:30.1341459Z Cache errors                          0
2025-12-04T14:40:30.1341835Z Compilations                        273
2025-12-04T14:40:30.1342053Z Compilation failures                  2
2025-12-04T14:40:30.1342272Z Non-cacheable compilations            0
2025-12-04T14:40:30.1342491Z Non-cacheable calls                 299
2025-12-04T14:40:30.1342845Z Non-compilation calls              3381
2025-12-04T14:40:30.1343443Z Unsupported compiler calls            0
2025-12-04T14:40:30.1343824Z Average cache write               0.048 s
2025-12-04T14:40:30.1344059Z Average compiler                  5.008 s
2025-12-04T14:40:30.1344275Z Average cache read hit            0.027 s
2025-12-04T14:40:30.1344513Z Failed distributed compilations       0
2025-12-04T14:40:30.1344669Z 
2025-12-04T14:40:30.1344749Z Non-cacheable reasons:
2025-12-04T14:40:30.1344935Z unknown source language             241
2025-12-04T14:40:30.1345143Z -E                                   58
2025-12-04T14:40:30.1345297Z 
2025-12-04T14:40:30.1345474Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T14:40:30.1345804Z Version (client)                0.10.0
2025-12-04T14:40:30.1346019Z + sccache --stop-server
2025-12-04T14:40:30.1360096Z Stopping sccache server...
2025-12-04T14:40:30.1362971Z Compile requests                   4236
2025-12-04T14:40:30.1363228Z Compile requests executed           556
2025-12-04T14:40:30.1363474Z Cache hits                          281
2025-12-04T14:40:30.1363687Z Cache hits (C/C++)                  281
2025-12-04T14:40:30.1363907Z Cache misses                        273
2025-12-04T14:40:30.1364105Z Cache misses (C/C++)                273
2025-12-04T14:40:30.1364335Z Cache hits rate                   50.72 %
2025-12-04T14:40:30.1364559Z Cache hits rate (C/C++)           50.72 %
2025-12-04T14:40:30.1364765Z Cache timeouts                        0
2025-12-04T14:40:30.1364988Z Cache read errors                     0
2025-12-04T14:40:30.1365200Z Forced recaches                       0
2025-12-04T14:40:30.1365398Z Cache write errors                    0
2025-12-04T14:40:30.1365607Z Cache errors                          0
2025-12-04T14:40:30.1365812Z Compilations                        273
2025-12-04T14:40:30.1366038Z Compilation failures                  2
2025-12-04T14:40:30.1366255Z Non-cacheable compilations            0
2025-12-04T14:40:30.1366722Z Non-cacheable calls                 299
2025-12-04T14:40:30.1367136Z Non-compilation calls              3381
2025-12-04T14:40:30.1367528Z Unsupported compiler calls            0
2025-12-04T14:40:30.1367809Z Average cache write               0.048 s
2025-12-04T14:40:30.1368040Z Average compiler                  5.008 s
2025-12-04T14:40:30.1368259Z Average cache read hit            0.027 s
2025-12-04T14:40:30.1368486Z Failed distributed compilations       0
2025-12-04T14:40:30.1368634Z 
2025-12-04T14:40:30.1368711Z Non-cacheable reasons:
2025-12-04T14:40:30.1368899Z unknown source language             241
2025-12-04T14:40:30.1369115Z -E                                   58
2025-12-04T14:40:30.1369262Z 
2025-12-04T14:40:30.1369435Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T14:40:30.1369762Z Version (client)                0.10.0
2025-12-04T14:40:30.1369990Z + echo ::endgroup::
2025-12-04T14:40:30.1370357Z ##[endgroup]
2025-12-04T14:40:30.1370514Z + cleanup_workspace
2025-12-04T14:40:30.1370883Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.'
2025-12-04T14:40:30.1371432Z sudo may print the following warning message that can be ignored. The chown command will still run.
2025-12-04T14:40:30.1371873Z + echo '    sudo: setrlimit(RLIMIT_STACK): Operation not permitted'
2025-12-04T14:40:30.1372191Z     sudo: setrlimit(RLIMIT_STACK): Operation not permitted
2025-12-04T14:40:30.1372575Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42'
2025-12-04T14:40:30.1372979Z For more details refer to https://github.com/sudo-project/sudo/issues/42
2025-12-04T14:40:30.1373376Z + sudo chown -R 1000 /var/lib/jenkins/workspace
2025-12-04T14:40:31.1379206Z ##[error]Process completed with exit code 1.
2025-12-04T14:40:31.1446367Z Prepare all required actions
2025-12-04T14:40:31.1446718Z Getting action download info
2025-12-04T14:40:31.3202645Z ##[group]Run ./.github/actions/pytest-cache-upload
2025-12-04T14:40:31.3203086Z with:
2025-12-04T14:40:31.3203265Z   cache_dir: .pytest_cache
2025-12-04T14:40:31.3203462Z   shard: 2
2025-12-04T14:40:31.3203651Z   sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T14:40:31.3203902Z   test_config: default
2025-12-04T14:40:31.3204146Z   job_identifier: trunk_linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T14:40:31.3204415Z env:
2025-12-04T14:40:31.3204576Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:31.3204775Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:31.3205018Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:31.3205431Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:31.3205801Z ##[endgroup]
2025-12-04T14:40:31.3233780Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T14:40:31.3234033Z with:
2025-12-04T14:40:31.3234181Z   shell: bash
2025-12-04T14:40:31.3234350Z   timeout_minutes: 5
2025-12-04T14:40:31.3234520Z   max_attempts: 5
2025-12-04T14:40:31.3234698Z   retry_wait_seconds: 30
2025-12-04T14:40:31.3234958Z   command: set -eu
python3 -m pip install boto3==1.35.42

2025-12-04T14:40:31.3235233Z   polling_interval_seconds: 1
2025-12-04T14:40:31.3235432Z   warning_on_retry: true
2025-12-04T14:40:31.3235609Z   continue_on_error: false
2025-12-04T14:40:31.3235796Z env:
2025-12-04T14:40:31.3235945Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:31.3236125Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:31.3236349Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:31.3236734Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:31.3237075Z ##[endgroup]
2025-12-04T14:40:31.7550283Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T14:40:32.8388335Z Collecting boto3==1.35.42
2025-12-04T14:40:32.8535363Z   Downloading boto3-1.35.42-py3-none-any.whl (139 kB)
2025-12-04T14:40:32.8672235Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0)
2025-12-04T14:40:33.9980749Z Collecting botocore<1.36.0,>=1.35.42
2025-12-04T14:40:34.0011654Z   Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB)
2025-12-04T14:40:34.1992171Z Collecting s3transfer<0.11.0,>=0.10.0
2025-12-04T14:40:34.2026404Z   Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB)
2025-12-04T14:40:34.2107229Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1)
2025-12-04T14:40:34.2116790Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10)
2025-12-04T14:40:34.4021511Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0)
2025-12-04T14:40:34.4818333Z Installing collected packages: botocore, s3transfer, boto3
2025-12-04T14:40:35.0402508Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4
2025-12-04T14:40:35.3958727Z Command completed after 1 attempt(s).
2025-12-04T14:40:35.4019347Z ##[group]Run python3 .github/scripts/pytest_cache.py \
2025-12-04T14:40:35.4019686Z [36;1mpython3 .github/scripts/pytest_cache.py \[0m
2025-12-04T14:40:35.4019951Z [36;1m  --upload \[0m
2025-12-04T14:40:35.4020179Z [36;1m  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \[0m
2025-12-04T14:40:35.4020450Z [36;1m  --pr_identifier "$GITHUB_REF" \[0m
2025-12-04T14:40:35.4020714Z [36;1m  --job_identifier "$JOB_IDENTIFIER" \[0m
2025-12-04T14:40:35.4020951Z [36;1m  --sha "$SHA" \[0m
2025-12-04T14:40:35.4021149Z [36;1m  --test_config "$TEST_CONFIG" \[0m
2025-12-04T14:40:35.4021393Z [36;1m  --shard "$SHARD" \[0m
2025-12-04T14:40:35.4021604Z [36;1m  --repo "$REPO" \[0m
2025-12-04T14:40:35.4022010Z [36;1m  --temp_dir "$RUNNER_TEMP" \[0m
2025-12-04T14:40:35.4034056Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:35.4034337Z env:
2025-12-04T14:40:35.4034694Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:35.4034893Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:35.4035132Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:35.4035527Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:35.4035896Z   CACHE_DIR: .pytest_cache
2025-12-04T14:40:35.4036135Z   JOB_IDENTIFIER: trunk_linux-jammy-cuda12.8-py3.10-gcc11
2025-12-04T14:40:35.4036427Z   SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T14:40:35.4036666Z   TEST_CONFIG: default
2025-12-04T14:40:35.4036830Z   SHARD: 2
2025-12-04T14:40:35.4036988Z   REPO: pytorch/pytorch
2025-12-04T14:40:35.4037168Z ##[endgroup]
2025-12-04T14:40:35.8337745Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013`
2025-12-04T14:40:35.8339556Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='trunk_linux-jammy-cuda12.8-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='default', shard='2', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None)
2025-12-04T14:40:35.8341498Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache
2025-12-04T14:40:35.8342595Z      to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2
2025-12-04T14:40:35.8344026Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip
2025-12-04T14:40:35.8345267Z        to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/default/2.zip
2025-12-04T14:40:35.8713554Z ##[group]Run cat test/**/*_toprint.log || true
2025-12-04T14:40:35.8713893Z [36;1mcat test/**/*_toprint.log || true[0m
2025-12-04T14:40:35.8722279Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:35.8722559Z env:
2025-12-04T14:40:35.8722721Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:35.8722906Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:35.8723136Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:35.8723544Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:35.8723904Z ##[endgroup]
2025-12-04T14:40:35.8827423Z cat: 'test/**/*_toprint.log': No such file or directory
2025-12-04T14:40:35.8853124Z ##[group]Run kill "$MONITOR_SCRIPT_PID"
2025-12-04T14:40:35.8853428Z [36;1mkill "$MONITOR_SCRIPT_PID"[0m
2025-12-04T14:40:35.8860384Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:35.8860659Z env:
2025-12-04T14:40:35.8860817Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:35.8861017Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:35.8861248Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:35.8861654Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:35.8862020Z   MONITOR_SCRIPT_PID: 58792
2025-12-04T14:40:35.8862209Z ##[endgroup]
2025-12-04T14:40:35.8887446Z /home/ec2-user/actions-runner/_work/_temp/76cb5f06-528c-4af5-8335-b307bb0b82bb.sh: line 1: kill: (58792) - No such process
2025-12-04T14:40:35.8890098Z ##[error]Process completed with exit code 1.
2025-12-04T14:40:35.8982701Z Prepare all required actions
2025-12-04T14:40:35.8983073Z Getting action download info
2025-12-04T14:40:36.1137940Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T14:40:36.3212644Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02)
2025-12-04T14:40:36.6503185Z ##[group]Run ./.github/actions/upload-test-artifacts
2025-12-04T14:40:36.6503648Z with:
2025-12-04T14:40:36.6503965Z   file-suffix: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T14:40:36.6504349Z   s3-bucket: gha-artifacts
2025-12-04T14:40:36.6504548Z env:
2025-12-04T14:40:36.6504706Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:36.6504908Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:36.6505155Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:36.6505575Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:36.6505964Z ##[endgroup]
2025-12-04T14:40:36.6526599Z ##[group]Run # Remove any previous test jsons if they exist
2025-12-04T14:40:36.6526937Z [36;1m# Remove any previous test jsons if they exist[0m
2025-12-04T14:40:36.6527204Z [36;1mrm -f test-jsons-*.zip[0m
2025-12-04T14:40:36.6527513Z [36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json'[0m
2025-12-04T14:40:36.6535179Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:36.6535468Z env:
2025-12-04T14:40:36.6535626Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:36.6535817Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:36.6536041Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:36.6536436Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:36.6536933Z   FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T14:40:36.6537283Z ##[endgroup]
2025-12-04T14:40:36.6762158Z   adding: test/test-reports/td_exclusions-c7fdf23ebe9f44ecabec.json (deflated 82%)
2025-12-04T14:40:36.6773936Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d77224b10dd1e10b.json (deflated 94%)
2025-12-04T14:40:36.6840839Z   adding: test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-87366e2d7057b5b0.json (deflated 92%)
2025-12-04T14:40:36.6841749Z   adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-f32aa134ae4d7a45.json (deflated 94%)
2025-12-04T14:40:36.6842633Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.json (deflated 87%)
2025-12-04T14:40:36.6843530Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.json (deflated 86%)
2025-12-04T14:40:36.6844408Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.json (deflated 86%)
2025-12-04T14:40:36.6845271Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.json (deflated 87%)
2025-12-04T14:40:36.6846132Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.json (deflated 86%)
2025-12-04T14:40:36.6847001Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.json (deflated 86%)
2025-12-04T14:40:36.6847857Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.json (deflated 87%)
2025-12-04T14:40:36.6848714Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.json (deflated 86%)
2025-12-04T14:40:36.6849802Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.json (deflated 86%)
2025-12-04T14:40:36.6850711Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.json (deflated 87%)
2025-12-04T14:40:36.6851586Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.json (deflated 86%)
2025-12-04T14:40:36.6852585Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.json (deflated 86%)
2025-12-04T14:40:36.6853445Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.json (deflated 87%)
2025-12-04T14:40:36.6854312Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.json (deflated 86%)
2025-12-04T14:40:36.6855171Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.json (deflated 86%)
2025-12-04T14:40:36.6856030Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.json (deflated 87%)
2025-12-04T14:40:36.6856879Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.json (deflated 86%)
2025-12-04T14:40:36.6857808Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.json (deflated 86%)
2025-12-04T14:40:36.6858775Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.json (deflated 87%)
2025-12-04T14:40:36.6859713Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.json (deflated 86%)
2025-12-04T14:40:36.6860729Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.json (deflated 86%)
2025-12-04T14:40:36.6861723Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.json (deflated 87%)
2025-12-04T14:40:36.6862673Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.json (deflated 86%)
2025-12-04T14:40:36.6863682Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.json (deflated 86%)
2025-12-04T14:40:36.6864619Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.json (deflated 87%)
2025-12-04T14:40:36.6865566Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.json (deflated 86%)
2025-12-04T14:40:36.6866558Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.json (deflated 86%)
2025-12-04T14:40:36.6867520Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.json (deflated 87%)
2025-12-04T14:40:36.6868460Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.json (deflated 86%)
2025-12-04T14:40:36.6869434Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.json (deflated 86%)
2025-12-04T14:40:36.6870401Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.json (deflated 87%)
2025-12-04T14:40:36.6871460Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.json (deflated 86%)
2025-12-04T14:40:36.6872492Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.json (deflated 86%)
2025-12-04T14:40:36.6873437Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.json (deflated 87%)
2025-12-04T14:40:36.6874441Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.json (deflated 86%)
2025-12-04T14:40:36.6875475Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.json (deflated 86%)
2025-12-04T14:40:36.6876435Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.json (deflated 87%)
2025-12-04T14:40:36.6877457Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.json (deflated 86%)
2025-12-04T14:40:36.6878407Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.json (deflated 86%)
2025-12-04T14:40:36.6879322Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.json (deflated 87%)
2025-12-04T14:40:36.6880454Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.json (deflated 86%)
2025-12-04T14:40:36.6881407Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.json (deflated 86%)
2025-12-04T14:40:36.6882363Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.json (deflated 87%)
2025-12-04T14:40:36.6883381Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.json (deflated 86%)
2025-12-04T14:40:36.6884313Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.json (deflated 86%)
2025-12-04T14:40:36.6885286Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.json (deflated 87%)
2025-12-04T14:40:36.6886301Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.json (deflated 86%)
2025-12-04T14:40:36.6887253Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.json (deflated 86%)
2025-12-04T14:40:36.6888262Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.json (deflated 87%)
2025-12-04T14:40:36.6889202Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.json (deflated 86%)
2025-12-04T14:40:36.6890240Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.json (deflated 86%)
2025-12-04T14:40:36.6891228Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.json (deflated 87%)
2025-12-04T14:40:36.6892212Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.json (deflated 86%)
2025-12-04T14:40:36.6893299Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.json (deflated 86%)
2025-12-04T14:40:36.6894281Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.json (deflated 87%)
2025-12-04T14:40:36.6895239Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.json (deflated 86%)
2025-12-04T14:40:36.6896164Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.json (deflated 86%)
2025-12-04T14:40:36.6897279Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.json (deflated 87%)
2025-12-04T14:40:36.6898233Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.json (deflated 86%)
2025-12-04T14:40:36.6899133Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.json (deflated 86%)
2025-12-04T14:40:36.6900243Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.json (deflated 87%)
2025-12-04T14:40:36.6901186Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.json (deflated 86%)
2025-12-04T14:40:36.6902177Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.json (deflated 86%)
2025-12-04T14:40:36.6903122Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.json (deflated 87%)
2025-12-04T14:40:36.6904068Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.json (deflated 86%)
2025-12-04T14:40:36.6905069Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.json (deflated 86%)
2025-12-04T14:40:36.6906058Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.json (deflated 87%)
2025-12-04T14:40:36.6907004Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.json (deflated 86%)
2025-12-04T14:40:36.6907957Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.json (deflated 86%)
2025-12-04T14:40:36.6908934Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.json (deflated 87%)
2025-12-04T14:40:36.6909884Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.json (deflated 86%)
2025-12-04T14:40:36.6910891Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.json (deflated 86%)
2025-12-04T14:40:36.6911898Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.json (deflated 87%)
2025-12-04T14:40:36.6912786Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.json (deflated 86%)
2025-12-04T14:40:36.6913814Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.json (deflated 86%)
2025-12-04T14:40:36.6914762Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.json (deflated 87%)
2025-12-04T14:40:36.6915864Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.json (deflated 86%)
2025-12-04T14:40:36.6916835Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.json (deflated 86%)
2025-12-04T14:40:36.6918060Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.json (deflated 87%)
2025-12-04T14:40:36.6919262Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.json (deflated 86%)
2025-12-04T14:40:36.6920303Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.json (deflated 86%)
2025-12-04T14:40:36.6921277Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.json (deflated 87%)
2025-12-04T14:40:36.6922363Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.json (deflated 86%)
2025-12-04T14:40:36.6923291Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.json (deflated 86%)
2025-12-04T14:40:36.6924260Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.json (deflated 87%)
2025-12-04T14:40:36.6925269Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.json (deflated 86%)
2025-12-04T14:40:36.6926231Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.json (deflated 86%)
2025-12-04T14:40:36.6927143Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.json (deflated 87%)
2025-12-04T14:40:36.6928166Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.json (deflated 86%)
2025-12-04T14:40:36.6929165Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.json (deflated 86%)
2025-12-04T14:40:36.6930166Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.json (deflated 87%)
2025-12-04T14:40:36.6931138Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.json (deflated 86%)
2025-12-04T14:40:36.6932076Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.json (deflated 86%)
2025-12-04T14:40:36.6933065Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.json (deflated 87%)
2025-12-04T14:40:36.6934109Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.json (deflated 86%)
2025-12-04T14:40:36.6935069Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.json (deflated 86%)
2025-12-04T14:40:36.6936073Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.json (deflated 87%)
2025-12-04T14:40:36.6937036Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.json (deflated 86%)
2025-12-04T14:40:36.6937993Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.json (deflated 86%)
2025-12-04T14:40:36.6939160Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.json (deflated 87%)
2025-12-04T14:40:36.6940135Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.json (deflated 86%)
2025-12-04T14:40:36.6941072Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.json (deflated 86%)
2025-12-04T14:40:36.6942171Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.json (deflated 87%)
2025-12-04T14:40:36.6943128Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.json (deflated 86%)
2025-12-04T14:40:36.6944141Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.json (deflated 86%)
2025-12-04T14:40:36.6945169Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.json (deflated 87%)
2025-12-04T14:40:36.6946119Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.json (deflated 86%)
2025-12-04T14:40:36.6947065Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.json (deflated 86%)
2025-12-04T14:40:36.6948059Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.json (deflated 87%)
2025-12-04T14:40:36.6949007Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.json (deflated 86%)
2025-12-04T14:40:36.6950019Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.json (deflated 86%)
2025-12-04T14:40:36.6950929Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.json (deflated 87%)
2025-12-04T14:40:36.6951866Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.json (deflated 86%)
2025-12-04T14:40:36.6952879Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.json (deflated 86%)
2025-12-04T14:40:36.6953829Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.json (deflated 87%)
2025-12-04T14:40:36.6954851Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.json (deflated 86%)
2025-12-04T14:40:36.6955824Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.json (deflated 86%)
2025-12-04T14:40:36.6956778Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.json (deflated 87%)
2025-12-04T14:40:36.6957728Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.json (deflated 86%)
2025-12-04T14:40:36.6958746Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.json (deflated 86%)
2025-12-04T14:40:36.6959688Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.json (deflated 87%)
2025-12-04T14:40:36.6960692Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.json (deflated 86%)
2025-12-04T14:40:36.6961789Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.json (deflated 86%)
2025-12-04T14:40:36.6962764Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.json (deflated 87%)
2025-12-04T14:40:36.6963765Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.json (deflated 86%)
2025-12-04T14:40:36.6964771Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.json (deflated 86%)
2025-12-04T14:40:36.6965806Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.json (deflated 87%)
2025-12-04T14:40:36.6966825Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.json (deflated 86%)
2025-12-04T14:40:36.6967791Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.json (deflated 86%)
2025-12-04T14:40:36.6968746Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.json (deflated 87%)
2025-12-04T14:40:36.6969739Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.json (deflated 86%)
2025-12-04T14:40:36.6970709Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.json (deflated 86%)
2025-12-04T14:40:36.6971632Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.json (deflated 87%)
2025-12-04T14:40:36.6972637Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.json (deflated 86%)
2025-12-04T14:40:36.6973598Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.json (deflated 86%)
2025-12-04T14:40:36.6974503Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.json (deflated 87%)
2025-12-04T14:40:36.6975535Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.json (deflated 86%)
2025-12-04T14:40:36.6976552Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.json (deflated 86%)
2025-12-04T14:40:36.6977526Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.json (deflated 87%)
2025-12-04T14:40:36.6978503Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.json (deflated 86%)
2025-12-04T14:40:36.6979424Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.json (deflated 86%)
2025-12-04T14:40:36.6980406Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.json (deflated 87%)
2025-12-04T14:40:36.6981393Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.json (deflated 86%)
2025-12-04T14:40:36.6982321Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.json (deflated 86%)
2025-12-04T14:40:36.6983457Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.json (deflated 87%)
2025-12-04T14:40:36.6984377Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.json (deflated 86%)
2025-12-04T14:40:36.6985319Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.json (deflated 86%)
2025-12-04T14:40:36.6986350Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.json (deflated 87%)
2025-12-04T14:40:36.6987454Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.json (deflated 86%)
2025-12-04T14:40:36.6988453Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.json (deflated 86%)
2025-12-04T14:40:36.6989393Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.json (deflated 87%)
2025-12-04T14:40:36.6990339Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.json (deflated 86%)
2025-12-04T14:40:36.6991342Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.json (deflated 86%)
2025-12-04T14:40:36.6992309Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.json (deflated 87%)
2025-12-04T14:40:36.6993227Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.json (deflated 86%)
2025-12-04T14:40:36.6994229Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.json (deflated 86%)
2025-12-04T14:40:36.6995194Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.json (deflated 87%)
2025-12-04T14:40:36.6996163Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.json (deflated 86%)
2025-12-04T14:40:36.6997154Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.json (deflated 86%)
2025-12-04T14:40:36.6998139Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.json (deflated 87%)
2025-12-04T14:40:36.6999104Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.json (deflated 86%)
2025-12-04T14:40:36.7000234Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.json (deflated 86%)
2025-12-04T14:40:36.7001211Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.json (deflated 87%)
2025-12-04T14:40:36.7002151Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.json (deflated 86%)
2025-12-04T14:40:36.7003123Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.json (deflated 86%)
2025-12-04T14:40:36.7004100Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.json (deflated 87%)
2025-12-04T14:40:36.7005038Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.json (deflated 86%)
2025-12-04T14:40:36.7006155Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.json (deflated 86%)
2025-12-04T14:40:36.7007078Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.json (deflated 87%)
2025-12-04T14:40:36.7008017Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.json (deflated 86%)
2025-12-04T14:40:36.7009181Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.json (deflated 86%)
2025-12-04T14:40:36.7010136Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.json (deflated 87%)
2025-12-04T14:40:36.7011128Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.json (deflated 86%)
2025-12-04T14:40:36.7012086Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.json (deflated 86%)
2025-12-04T14:40:36.7013032Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.json (stored 0%)
2025-12-04T14:40:36.7040904Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-2cb94ab29d3c0df8.json (deflated 97%)
2025-12-04T14:40:36.7048698Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-ab5aa4d4069f84fb.json (deflated 95%)
2025-12-04T14:40:36.7056431Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-35f77a6004c12574.json (deflated 95%)
2025-12-04T14:40:36.7063732Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-4e86df63b0f31fa2.json (deflated 95%)
2025-12-04T14:40:36.7071650Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-421be54fd2475226.json (deflated 95%)
2025-12-04T14:40:36.7079326Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-269f711d29febed7.json (deflated 95%)
2025-12-04T14:40:36.7151742Z   adding: test/test-reports/python-pytest/test_ops/test_ops-f35e359ea3f52347.json (deflated 96%)
2025-12-04T14:40:36.7236766Z   adding: test/test-reports/python-pytest/test_ops/test_ops-7327fc5de50caef8.json (deflated 96%)
2025-12-04T14:40:36.7270124Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-e6d2768dce09d0dd.json (deflated 94%)
2025-12-04T14:40:36.7275467Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-52326583abfcb307.json (deflated 96%)
2025-12-04T14:40:36.7280870Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-b0e5cfa73b17bf79.json (deflated 96%)
2025-12-04T14:40:36.7286377Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-44c334397fb0c3bd.json (deflated 96%)
2025-12-04T14:40:36.7292781Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3098e6f6c63481df.json (deflated 92%)
2025-12-04T14:40:36.7327074Z   adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-d8fc516c8be54fc6.json (deflated 93%)
2025-12-04T14:40:36.7328114Z   adding: test/test-reports/python-pytest/inductor.test_layout_optim/inductor.test_layout_optim-ff0e0fc528f4f3dd.json (stored 0%)
2025-12-04T14:40:36.7342831Z   adding: test/test-reports/python-pytest/dynamo.test_exc/dynamo.test_exc-59dcc92175511a1b.json (deflated 95%)
2025-12-04T14:40:36.7350521Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_arrayref/inductor.test_aot_inductor_arrayref-c35059cecd7c3b99.json (deflated 95%)
2025-12-04T14:40:36.7351836Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccee7b90c33901e0.json (deflated 88%)
2025-12-04T14:40:36.7352859Z   adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-4527efee43b2418d.json (deflated 76%)
2025-12-04T14:40:36.7353815Z   adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-1bfa13dfa66ba37a.json (deflated 72%)
2025-12-04T14:40:36.7354649Z   adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-d291715b5fb08603.json (deflated 92%)
2025-12-04T14:40:36.7355547Z   adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-8f5f76cf24e3a322.json (deflated 34%)
2025-12-04T14:40:36.7356275Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.json (deflated 93%)
2025-12-04T14:40:36.7356935Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.json (deflated 85%)
2025-12-04T14:40:36.7357579Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.json (deflated 85%)
2025-12-04T14:40:36.7361109Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.json (deflated 94%)
2025-12-04T14:40:36.7361781Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.json (deflated 74%)
2025-12-04T14:40:36.7368831Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.json (deflated 97%)
2025-12-04T14:40:36.7370666Z   adding: test/test-reports/python-pytest/inductor.test_flex_flash/inductor.test_flex_flash-594b02277dbafddb.json (deflated 97%)
2025-12-04T14:40:36.7371439Z   adding: test/test-reports/python-pytest/inductor.test_segmented_tree/inductor.test_segmented_tree-3edcd06141f9e439.json (deflated 90%)
2025-12-04T14:40:36.7372275Z   adding: test/test-reports/python-pytest/inductor.test_kernel_optimization/inductor.test_kernel_optimization-b711ca038cf9dded.json (deflated 38%)
2025-12-04T14:40:36.7373067Z   adding: test/test-reports/python-pytest/inductor.test_metrics/inductor.test_metrics-b803ef0c4a9491e7.json (deflated 78%)
2025-12-04T14:40:36.7373867Z   adding: test/test-reports/python-pytest/export.test_unflatten_training_ir/export.test_unflatten_training_ir-0b06d16f89271e02.json (deflated 94%)
2025-12-04T14:40:36.7390749Z   adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-31757cb9ac5c1c41.json (deflated 95%)
2025-12-04T14:40:36.7391764Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-0fae1aaa4a2003eb.json (deflated 92%)
2025-12-04T14:40:36.7392787Z   adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-bc092756daa3cb92.json (deflated 84%)
2025-12-04T14:40:36.7474137Z   adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-707c3510a208c1b4.json (deflated 95%)
2025-12-04T14:40:36.7475056Z   adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-c5f7586c63bf6a73.json (deflated 47%)
2025-12-04T14:40:36.7476022Z   adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-a0634e57d5b5356a.json (deflated 91%)
2025-12-04T14:40:36.7476983Z   adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-31b2988c053252a6.json (deflated 69%)
2025-12-04T14:40:36.7477955Z   adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-84688fa1768a881f.json (deflated 54%)
2025-12-04T14:40:36.7478903Z   adding: test/test-reports/python-pytest/inductor.test_best_config/inductor.test_best_config-423c961bf0014ead.json (deflated 52%)
2025-12-04T14:40:36.7479785Z   adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-21258e52645f11ac.json (deflated 56%)
2025-12-04T14:40:36.7491664Z   adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-bb39f909a1ebabd7.json (deflated 96%)
2025-12-04T14:40:36.7493908Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_custom_ops/inductor.test_aot_inductor_custom_ops-a7a529277f9f9a31.json (deflated 94%)
2025-12-04T14:40:36.7506497Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-955a1f614439a238.json (deflated 97%)
2025-12-04T14:40:36.7507798Z   adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-e1e65156309c3950.json (deflated 87%)
2025-12-04T14:40:36.7508886Z   adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e4d33245d4b5500b.json (deflated 91%)
2025-12-04T14:40:36.7511220Z   adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-212782fb09fba1b6.json (deflated 90%)
2025-12-04T14:40:36.7512296Z   adding: test/test-reports/python-pytest/inductor.test_needs_exact_strides/inductor.test_needs_exact_strides-9df9f4b52deeb26d.json (deflated 70%)
2025-12-04T14:40:36.7524842Z   adding: test/test-reports/python-pytest/inductor.test_auto_functionalize/inductor.test_auto_functionalize-2721b330ad87dcbb.json (deflated 95%)
2025-12-04T14:40:36.7525980Z   adding: test/test-reports/python-pytest/dynamo.test_modes/dynamo.test_modes-84ffae5b03f7e325.json (deflated 85%)
2025-12-04T14:40:36.7527016Z   adding: test/test-reports/python-pytest/inductor.test_custom_partitioner_fn/inductor.test_custom_partitioner_fn-7cdf0df133f39710.json (deflated 50%)
2025-12-04T14:40:36.7528045Z   adding: test/test-reports/python-pytest/dynamo.test_debug_utils/dynamo.test_debug_utils-34b85abff78e1075.json (deflated 76%)
2025-12-04T14:40:36.7528961Z   adding: test/test-reports/python-pytest/dynamo.test_base_hop/dynamo.test_base_hop-f16fc4276458e68a.json (deflated 79%)
2025-12-04T14:40:36.7536382Z   adding: test/test-reports/python-pytest/dynamo.test_export/dynamo.test_export-b39b0ef66a188fdf.json (deflated 90%)
2025-12-04T14:40:36.7537327Z   adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-89e5f4289c609732.json (deflated 84%)
2025-12-04T14:40:36.7538244Z   adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-4cc8060b16634bb1.json (deflated 94%)
2025-12-04T14:40:36.7539487Z   adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-9dcd937885307cef.json (deflated 94%)
2025-12-04T14:40:36.7540443Z   adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-22b40053bc190597.json (deflated 72%)
2025-12-04T14:40:36.7541718Z   adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-e50f738759450405.json (deflated 87%)
2025-12-04T14:40:36.7543133Z   adding: test/test-reports/python-pytest/dynamo.test_cudagraphs_expandable_segments/dynamo.test_cudagraphs_expandable_segments-6088bb8977cfc034.json (deflated 87%)
2025-12-04T14:40:36.7546313Z   adding: test/test-reports/python-pytest/inductor.test_caching/inductor.test_caching-81c36c30e8c9f16c.json (deflated 97%)
2025-12-04T14:40:36.7547504Z   adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-8774cf5ade30b7b9.json (deflated 87%)
2025-12-04T14:40:36.7552099Z   adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-debad3a483737c49.json (deflated 93%)
2025-12-04T14:40:36.7553095Z   adding: test/test-reports/python-pytest/dynamo.test_comptime/dynamo.test_comptime-47f6d1d4947f2a1a.json (deflated 83%)
2025-12-04T14:40:36.7554029Z   adding: test/test-reports/python-pytest/test_privateuseone_python_backend/test_privateuseone_python_backend-af92c89cf1e734fe.json (deflated 65%)
2025-12-04T14:40:36.7555013Z   adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-2881d42f49f0d5f4.json (deflated 88%)
2025-12-04T14:40:36.7556118Z   adding: test/test-reports/python-pytest/functorch.test_parsing/functorch.test_parsing-b19da695e97869b8.json (deflated 88%)
2025-12-04T14:40:36.7557001Z   adding: test/test-reports/python-pytest/test_varlen_attention/test_varlen_attention-c77e9d85fce2d7de.json (deflated 90%)
2025-12-04T14:40:36.7557827Z   adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-aebaabaedf0418a4.json (deflated 64%)
2025-12-04T14:40:36.7564089Z   adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-b67630ba146aa7a3.json (deflated 97%)
2025-12-04T14:40:36.7565074Z   adding: test/test-reports/python-pytest/test_autoload/test_autoload-f8ddaf02f0fba12a.json (deflated 36%)
2025-12-04T14:40:36.7565949Z   adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-090bcad7e8d69cda.json (deflated 63%)
2025-12-04T14:40:36.7566812Z   adding: test/test-reports/python-pytest/xpu.test_fusion/xpu.test_fusion-5eb21ed31dcf28c7.json (stored 0%)
2025-12-04T14:40:36.7615832Z   adding: test/test-reports/python-pytest/test_foreach/test_foreach-08a78fa61936b219.json (deflated 98%)
2025-12-04T14:40:36.7617480Z   adding: test/test-reports/python-pytest/test_pytree/test_pytree-863a0f55639901d8.json (deflated 96%)
2025-12-04T14:40:36.7618527Z   adding: test/test-reports/python-pytest/test_namedtuple_return_api/test_namedtuple_return_api-8cf83ed9877ffdac.json (deflated 76%)
2025-12-04T14:40:36.7619551Z   adding: test/test-reports/python-pytest/profiler.test_record_function/profiler.test_record_function-709377f3af2db71e.json (deflated 85%)
2025-12-04T14:40:36.7620543Z   adding: test/test-reports/python-pytest/test_compile_benchmark_util/test_compile_benchmark_util-827ec108afd6e51f.json (deflated 85%)
2025-12-04T14:40:36.7621568Z   adding: test/test-reports/python-pytest/test_set_default_mobile_cpu_allocator/test_set_default_mobile_cpu_allocator-a62b10a07a12c95d.json (deflated 66%)
2025-12-04T14:40:36.7626190Z   adding: test/test-reports/python-pytest/test_fake_tensor/test_fake_tensor-aa30317e19bcd391.json (deflated 94%)
2025-12-04T14:40:36.7802531Z   adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-6264828c6395375f.json (deflated 98%)
2025-12-04T14:40:36.7982669Z   adding: test/test-reports/python-pytest/test_meta/test_meta-be563441d2ad6907.json (deflated 97%)
2025-12-04T14:40:36.8009736Z   adding: test/test-reports/python-pytest/test_fx/test_fx-26e481d16f3bec04.json (deflated 97%)
2025-12-04T14:40:36.8041264Z   adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-e4282282966a7ff9.json (deflated 97%)
2025-12-04T14:40:36.8067302Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-59ab3583fe1f80dc.json (deflated 98%)
2025-12-04T14:40:36.8079563Z   adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-5ef9f5430d478fb5.json (deflated 96%)
2025-12-04T14:40:36.8083468Z   adding: test/test-reports/python-pytest/complex_tensor.test_complex_tensor/complex_tensor.test_complex_tensor-b8215f419723e2db.json (deflated 96%)
2025-12-04T14:40:36.8085051Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bedc5291ea06d7d4.json (deflated 97%)
2025-12-04T14:40:36.8112601Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-7c0d16c38d6c5d66.json (deflated 95%)
2025-12-04T14:40:36.8140046Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-53045e36c53c5e0b.json (deflated 95%)
2025-12-04T14:40:36.8141053Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-3f363f609079497e.json (deflated 91%)
2025-12-04T14:40:36.8146865Z   adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-84ccb0941d397f92.json (deflated 98%)
2025-12-04T14:40:36.8151435Z   adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-7140a1ac93a67fd6.json (deflated 96%)
2025-12-04T14:40:36.8208996Z   adding: test/test-reports/python-pytest/test_nn/test_nn-9b13ea9411b68db1.json (deflated 97%)
2025-12-04T14:40:36.8210193Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-6bd6911b2987aa11.json (deflated 95%)
2025-12-04T14:40:36.8211515Z   adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-a884db72e4413a95.json (deflated 91%)
2025-12-04T14:40:36.8214571Z   adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-21d4709389e5b8ee.json (deflated 96%)
2025-12-04T14:40:36.8217883Z   adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-ee3bcbb253c53d76.json (deflated 97%)
2025-12-04T14:40:36.8218603Z   adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-f847b013a309e4b4.json (deflated 88%)
2025-12-04T14:40:36.8219293Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d415fb392a79d6b5.json (deflated 35%)
2025-12-04T14:40:36.8219930Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-020323916fe208b4.json (deflated 33%)
2025-12-04T14:40:36.8220561Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-ec2ab5853a8f5bb5.json (deflated 33%)
2025-12-04T14:40:36.8221188Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2344770aa1066fe.json (deflated 34%)
2025-12-04T14:40:36.8221804Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7974fe8ff8b4bbf1.json (deflated 33%)
2025-12-04T14:40:36.8222431Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a89ca46644423e33.json (deflated 33%)
2025-12-04T14:40:36.8223039Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a24d5339b3f8a425.json (deflated 33%)
2025-12-04T14:40:36.8223660Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-e1f683ecde7e9859.json (deflated 33%)
2025-12-04T14:40:36.8224280Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-18c9d7ef1c10ca2f.json (deflated 34%)
2025-12-04T14:40:36.8224897Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c34005545b418b69.json (deflated 33%)
2025-12-04T14:40:36.8225496Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-fef6b14731bb5c3b.json (deflated 33%)
2025-12-04T14:40:36.8226104Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-9f269287574f2f5d.json (deflated 33%)
2025-12-04T14:40:36.8226724Z   adding: test/test-reports/python-pytest/test_native_mha/test_native_mha-92c62998ce15c273.json (deflated 95%)
2025-12-04T14:40:36.8227506Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numerictypes/torch_np.numpy_tests.core.test_numerictypes-794afd975a25dd08.json (deflated 95%)
2025-12-04T14:40:36.8228350Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-264e490a42609567.json (deflated 37%)
2025-12-04T14:40:36.8229089Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3424f861fbb3f30b.json (deflated 38%)
2025-12-04T14:40:36.8229824Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-34f31a11d054a5e2.json (deflated 38%)
2025-12-04T14:40:36.8230557Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-5cc2b0915fc09b1e.json (deflated 38%)
2025-12-04T14:40:36.8231286Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8b636380fe175cf1.json (deflated 38%)
2025-12-04T14:40:36.8232021Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3e5b65c914a199ce.json (deflated 38%)
2025-12-04T14:40:36.8232736Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-df0d139575237859.json (deflated 35%)
2025-12-04T14:40:36.8233625Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3264b6f40acb9dc5.json (deflated 37%)
2025-12-04T14:40:36.8234360Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8f705997ba04d72c.json (deflated 36%)
2025-12-04T14:40:36.8235066Z   adding: test/test-reports/python-pytest/test_function_schema/test_function_schema-23d7f60e4862c430.json (deflated 91%)
2025-12-04T14:40:36.8235736Z   adding: test/test-reports/python-pytest/test_accelerator/test_accelerator-96bd402fc51988fa.json (deflated 88%)
2025-12-04T14:40:36.8236458Z   adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-4d2f3b79bbd3891e.json (deflated 91%)
2025-12-04T14:40:36.8237241Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_methods/torch_np.numpy_tests.core.test_scalar_methods-ab020ac5345dfbce.json (deflated 97%)
2025-12-04T14:40:36.8238141Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_helper/torch_np.numpy_tests.fft.test_helper-f56f2bb61bb011d4.json (deflated 86%)
2025-12-04T14:40:36.8238945Z   adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-6eaefe04adaaf056.json (deflated 83%)
2025-12-04T14:40:36.8253247Z   adding: test/test-reports/python-pytest/test_overrides/test_overrides-54542bcdcb986158.json (deflated 97%)
2025-12-04T14:40:36.8254157Z   adding: test/test-reports/python-pytest/torch_np.test_function_base/torch_np.test_function_base-75280ab4c9ebfe9c.json (deflated 62%)
2025-12-04T14:40:36.8260768Z   adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-263da3bcb01bdba5.json (deflated 98%)
2025-12-04T14:40:36.8261697Z   adding: test/test-reports/python-pytest/torch_np.test_scalars_0D_arrays/torch_np.test_scalars_0D_arrays-0d870852e9b8ecce.json (deflated 96%)
2025-12-04T14:40:36.8262616Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-ed390b613a87fd4b.json (deflated 43%)
2025-12-04T14:40:36.8263448Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-30e4b3748ee506f0.json (deflated 42%)
2025-12-04T14:40:36.8264134Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-a99d9031384e91f6.json (deflated 35%)
2025-12-04T14:40:36.8264807Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-f9602eb3757a7925.json (deflated 42%)
2025-12-04T14:40:36.8265550Z   adding: test/test-reports/python-pytest/profiler.test_profiler_tree/profiler.test_profiler_tree-3ad20a65a3b0a20b.json (deflated 87%)
2025-12-04T14:40:36.8266415Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraysetops/torch_np.numpy_tests.lib.test_arraysetops-0f1d2c69ab44ce0e.json (deflated 96%)
2025-12-04T14:40:36.8267898Z   adding: test/test-reports/python-pytest/test_dlpack/test_dlpack-41fa7f929a572602.json (deflated 97%)
2025-12-04T14:40:36.8268877Z   adding: test/test-reports/python-pytest/profiler.test_torch_tidy/profiler.test_torch_tidy-6d6836cfdd083f06.json (deflated 85%)
2025-12-04T14:40:36.8269569Z   adding: test/test-reports/python-pytest/lazy.test_reuse_ir/lazy.test_reuse_ir-927d5809a72b7fa4.json (deflated 78%)
2025-12-04T14:40:36.8270323Z   adding: test/test-reports/python-pytest/test_functional_autograd_benchmark/test_functional_autograd_benchmark-0542bd395ed50334.json (deflated 63%)
2025-12-04T14:40:36.8333368Z   adding: test/test-reports/python-pytest/test_reductions/test_reductions-113ccd6215a90199.json (deflated 98%)
2025-12-04T14:40:36.8334277Z   adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204144025.json (deflated 37%)
2025-12-04T14:40:36.8361026Z ##[group]Run # Remove any previous test reports if they exist
2025-12-04T14:40:36.8361375Z [36;1m# Remove any previous test reports if they exist[0m
2025-12-04T14:40:36.8361654Z [36;1mrm -f test-reports-*.zip[0m
2025-12-04T14:40:36.8362056Z [36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv'[0m
2025-12-04T14:40:36.8370054Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:36.8370336Z env:
2025-12-04T14:40:36.8370493Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:36.8370679Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:36.8370901Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:36.8371281Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:36.8371789Z   FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T14:40:36.8372265Z ##[endgroup]
2025-12-04T14:40:36.8520855Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-d77224b10dd1e10b.xml (deflated 93%)
2025-12-04T14:40:36.8570654Z   adding: test/test-reports/python-pytest/dynamo.test_repros/dynamo.test_repros-87366e2d7057b5b0.xml (deflated 92%)
2025-12-04T14:40:36.8582749Z   adding: test/test-reports/python-pytest/inductor.test_flex_attention/inductor.test_flex_attention-f32aa134ae4d7a45.xml (deflated 93%)
2025-12-04T14:40:36.8583784Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1d5e50d3220be84.xml (deflated 87%)
2025-12-04T14:40:36.8584855Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-68bd725ac012aaf6.xml (deflated 85%)
2025-12-04T14:40:36.8585904Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f43d696b91c68e27.xml (deflated 85%)
2025-12-04T14:40:36.8586983Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32d3a27d38e00e52.xml (deflated 87%)
2025-12-04T14:40:36.8588031Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d971c2b5fa40f28c.xml (deflated 85%)
2025-12-04T14:40:36.8589317Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d4e5ee130381ea3.xml (deflated 85%)
2025-12-04T14:40:36.8590436Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7f039b6301f03638.xml (deflated 87%)
2025-12-04T14:40:36.8591521Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7547e2319a805dd.xml (deflated 85%)
2025-12-04T14:40:36.8592587Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f314cd6b44b1cdb.xml (deflated 85%)
2025-12-04T14:40:36.8593592Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-31537f65aa77d4f4.xml (deflated 87%)
2025-12-04T14:40:36.8594417Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11e8fb2fd4357c15.xml (deflated 85%)
2025-12-04T14:40:36.8595365Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-75666f3891d9ac7f.xml (deflated 85%)
2025-12-04T14:40:36.8596304Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bbf4ef91870a527.xml (deflated 86%)
2025-12-04T14:40:36.8597287Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-144df5003ab71cee.xml (deflated 85%)
2025-12-04T14:40:36.8598207Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5fd5f82c697f5c0c.xml (deflated 85%)
2025-12-04T14:40:36.8599123Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93ba75b7427cf884.xml (deflated 86%)
2025-12-04T14:40:36.8600139Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db3472bddf12b7a7.xml (deflated 85%)
2025-12-04T14:40:36.8601292Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-97d0cdfaafee5426.xml (deflated 85%)
2025-12-04T14:40:36.8602146Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3a149555401c32cc.xml (deflated 86%)
2025-12-04T14:40:36.8603122Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64ab9f5424c5493f.xml (deflated 85%)
2025-12-04T14:40:36.8604085Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bdae605562476ceb.xml (deflated 85%)
2025-12-04T14:40:36.8604914Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b2bbf25d96b76c9b.xml (deflated 86%)
2025-12-04T14:40:36.8605878Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5ac824c12758af27.xml (deflated 85%)
2025-12-04T14:40:36.8606718Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5a8939c38696fa6e.xml (deflated 85%)
2025-12-04T14:40:36.8607629Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8de08e52169132e4.xml (deflated 86%)
2025-12-04T14:40:36.8608629Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c872303ed892824.xml (deflated 85%)
2025-12-04T14:40:36.8609537Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86972921d28d1709.xml (deflated 85%)
2025-12-04T14:40:36.8610419Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-08f8aa88da0d4c3d.xml (deflated 86%)
2025-12-04T14:40:36.8611317Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6e5694a381ab599.xml (deflated 85%)
2025-12-04T14:40:36.8612212Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1c1c2119d10732c.xml (deflated 85%)
2025-12-04T14:40:36.8613046Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c94947c9bb46a4e.xml (deflated 86%)
2025-12-04T14:40:36.8614010Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e2657ebcfa165043.xml (deflated 85%)
2025-12-04T14:40:36.8614936Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e5a9540a53f5bbd7.xml (deflated 85%)
2025-12-04T14:40:36.8615782Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f5535d6178d67f54.xml (deflated 86%)
2025-12-04T14:40:36.8616726Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-839913cdd4a5fdb2.xml (deflated 85%)
2025-12-04T14:40:36.8617888Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca344a44fcbdba6a.xml (deflated 85%)
2025-12-04T14:40:36.8618718Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3d9537209be9ce80.xml (deflated 86%)
2025-12-04T14:40:36.8619678Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b6de87f4ee6a6c38.xml (deflated 85%)
2025-12-04T14:40:36.8620511Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-462df064e3458fc9.xml (deflated 85%)
2025-12-04T14:40:36.8621344Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4f351581eb409e8d.xml (deflated 86%)
2025-12-04T14:40:36.8622465Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94c0e5e2bee831c2.xml (deflated 85%)
2025-12-04T14:40:36.8623306Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7a973581a4e2c554.xml (deflated 85%)
2025-12-04T14:40:36.8624156Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-518d2a063958b0ac.xml (deflated 86%)
2025-12-04T14:40:36.8625105Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e1cf5b0397cd79e9.xml (deflated 85%)
2025-12-04T14:40:36.8626045Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b693ef47858459cd.xml (deflated 85%)
2025-12-04T14:40:36.8626896Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c603aefabd564f6f.xml (deflated 86%)
2025-12-04T14:40:36.8627843Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-476ed3473033d71c.xml (deflated 85%)
2025-12-04T14:40:36.8628678Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38079583fa3f76bd.xml (deflated 85%)
2025-12-04T14:40:36.8629512Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc6fbe2f84088a12.xml (deflated 86%)
2025-12-04T14:40:36.8630459Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a9efcd8b80cecd97.xml (deflated 85%)
2025-12-04T14:40:36.8631299Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a00be3c10f587c4d.xml (deflated 85%)
2025-12-04T14:40:36.8632149Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-21bfb76ef730b721.xml (deflated 86%)
2025-12-04T14:40:36.8633487Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51fe451ae52d8ee9.xml (deflated 85%)
2025-12-04T14:40:36.8634369Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e7e2b876b221ae6e.xml (deflated 85%)
2025-12-04T14:40:36.8635207Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a296a1ae2f954511.xml (deflated 86%)
2025-12-04T14:40:36.8636056Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-173808d08d9ed556.xml (deflated 85%)
2025-12-04T14:40:36.8637021Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0128790a7f0c548c.xml (deflated 85%)
2025-12-04T14:40:36.8637852Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2efc3529636beb3d.xml (deflated 86%)
2025-12-04T14:40:36.8638785Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5778c6a42245e5c5.xml (deflated 85%)
2025-12-04T14:40:36.8639665Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cbd45f232782bc2.xml (deflated 85%)
2025-12-04T14:40:36.8640630Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d890e4a6cbb89712.xml (deflated 87%)
2025-12-04T14:40:36.8641476Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b4568ad5eb5915b3.xml (deflated 85%)
2025-12-04T14:40:36.8642386Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ebe2595646f336e.xml (deflated 85%)
2025-12-04T14:40:36.8643347Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cfc45be16d95a5ee.xml (deflated 86%)
2025-12-04T14:40:36.8644191Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4b3d7c6eebbf264b.xml (deflated 85%)
2025-12-04T14:40:36.8645085Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7323c5eff762fde9.xml (deflated 85%)
2025-12-04T14:40:36.8646086Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-db81609099e15efb.xml (deflated 86%)
2025-12-04T14:40:36.8647011Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8169c375ae58c76b.xml (deflated 85%)
2025-12-04T14:40:36.8648041Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac75bac96a56365f.xml (deflated 85%)
2025-12-04T14:40:36.8648896Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-674ad3938f78a3d3.xml (deflated 86%)
2025-12-04T14:40:36.8649787Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f085b783f0e405ac.xml (deflated 85%)
2025-12-04T14:40:36.8650685Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84307678eab5d217.xml (deflated 85%)
2025-12-04T14:40:36.8651575Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-93f441dcac87b0dc.xml (deflated 86%)
2025-12-04T14:40:36.8652476Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-093a939a7121f539.xml (deflated 85%)
2025-12-04T14:40:36.8653410Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ed7e6d31e19a7f77.xml (deflated 85%)
2025-12-04T14:40:36.8654262Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fcb36d28ba877da8.xml (deflated 86%)
2025-12-04T14:40:36.8655106Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb130a6ac4c4d42.xml (deflated 85%)
2025-12-04T14:40:36.8656010Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d436a35a57eaea90.xml (deflated 85%)
2025-12-04T14:40:36.8656851Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76b1db4df066ac09.xml (deflated 87%)
2025-12-04T14:40:36.8657742Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bbaa588317639c61.xml (deflated 85%)
2025-12-04T14:40:36.8658770Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7cb6908bcfc4804b.xml (deflated 85%)
2025-12-04T14:40:36.8659710Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd6d9f99b37f4011.xml (deflated 86%)
2025-12-04T14:40:36.8660617Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d059803612c07abe.xml (deflated 85%)
2025-12-04T14:40:36.8661475Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2f99eb08b618a0a.xml (deflated 85%)
2025-12-04T14:40:36.8662316Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-76dedcabb72bb30d.xml (deflated 86%)
2025-12-04T14:40:36.8663338Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d102c48975f66f00.xml (deflated 85%)
2025-12-04T14:40:36.8664191Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e12a02efbce3f8f2.xml (deflated 85%)
2025-12-04T14:40:36.8665030Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-835df1857998cf06.xml (deflated 87%)
2025-12-04T14:40:36.8665872Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b90dc48e94da60a1.xml (deflated 85%)
2025-12-04T14:40:36.8666784Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82a5fa72618c2406.xml (deflated 85%)
2025-12-04T14:40:36.8667721Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32c3413eac3481c3.xml (deflated 86%)
2025-12-04T14:40:36.8668641Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b9498a5ec773296.xml (deflated 85%)
2025-12-04T14:40:36.8669512Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d690a534f220c503.xml (deflated 85%)
2025-12-04T14:40:36.8670409Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8635fba9f5b5afed.xml (deflated 86%)
2025-12-04T14:40:36.8671443Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2adccf8b9e051d5a.xml (deflated 85%)
2025-12-04T14:40:36.8672390Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50234d62b4ab45ea.xml (deflated 85%)
2025-12-04T14:40:36.8673276Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fda8ac892cff9b52.xml (deflated 86%)
2025-12-04T14:40:36.8674178Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6daec75554d576a1.xml (deflated 85%)
2025-12-04T14:40:36.8675009Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7357125e19fc0b47.xml (deflated 85%)
2025-12-04T14:40:36.8675846Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1372e7af4dc93064.xml (deflated 86%)
2025-12-04T14:40:36.8676796Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1890de1440f6da93.xml (deflated 85%)
2025-12-04T14:40:36.8677682Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ed71c1109750bb2.xml (deflated 85%)
2025-12-04T14:40:36.8678515Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b4b98b76b112369.xml (deflated 86%)
2025-12-04T14:40:36.8679412Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-fc4f9e9eb787f925.xml (deflated 85%)
2025-12-04T14:40:36.8680371Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cf10e80d579ed1a1.xml (deflated 85%)
2025-12-04T14:40:36.8681210Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f768cb22e37c95bb.xml (deflated 87%)
2025-12-04T14:40:36.8682193Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4cd3ee76d86b3b2d.xml (deflated 85%)
2025-12-04T14:40:36.8683148Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dd5744bb7f1104d.xml (deflated 85%)
2025-12-04T14:40:36.8684071Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8f8bda0471bacaab.xml (deflated 86%)
2025-12-04T14:40:36.8684921Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-39442f28ac15f7dd.xml (deflated 85%)
2025-12-04T14:40:36.8685749Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b5211a64a27fb03.xml (deflated 85%)
2025-12-04T14:40:36.8686652Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03811a38f7309b37.xml (deflated 86%)
2025-12-04T14:40:36.8687560Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3f4a82c64f8b823.xml (deflated 85%)
2025-12-04T14:40:36.8688386Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c722331da90a17a1.xml (deflated 85%)
2025-12-04T14:40:36.8689228Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-715dcfb7265e7117.xml (deflated 87%)
2025-12-04T14:40:36.8690176Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-af0a42bb02245e10.xml (deflated 85%)
2025-12-04T14:40:36.8691032Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5a947cb713f2103.xml (deflated 85%)
2025-12-04T14:40:36.8691873Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c9a860fbca8c784e.xml (deflated 86%)
2025-12-04T14:40:36.8692843Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-57d06208bb64cb40.xml (deflated 85%)
2025-12-04T14:40:36.8693715Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-27d39a08641974ca.xml (deflated 85%)
2025-12-04T14:40:36.8694546Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c115897706ac37ea.xml (deflated 86%)
2025-12-04T14:40:36.8695466Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-99c6159c4eb555cf.xml (deflated 85%)
2025-12-04T14:40:36.8696298Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71859eedfe6269a5.xml (deflated 85%)
2025-12-04T14:40:36.8697135Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b7ad6bc433aca4f5.xml (deflated 86%)
2025-12-04T14:40:36.8698054Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8eb71453e3d3b813.xml (deflated 85%)
2025-12-04T14:40:36.8698968Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f8f7752fccd9869.xml (deflated 85%)
2025-12-04T14:40:36.8699868Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b45e15b9b3058993.xml (deflated 86%)
2025-12-04T14:40:36.8700774Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab51729f4958ddc5.xml (deflated 85%)
2025-12-04T14:40:36.8701695Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c75b79372dbd5cd7.xml (deflated 85%)
2025-12-04T14:40:36.8702525Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e6e05e1cd235f382.xml (deflated 86%)
2025-12-04T14:40:36.8703416Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b5713604e4d5a687.xml (deflated 85%)
2025-12-04T14:40:36.8704419Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98fe1568229d1f43.xml (deflated 85%)
2025-12-04T14:40:36.8705353Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-89a0569137f2a5f8.xml (deflated 86%)
2025-12-04T14:40:36.8706215Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-26852b57f22709e5.xml (deflated 85%)
2025-12-04T14:40:36.8707194Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-51aaf4e0af1c22f7.xml (deflated 85%)
2025-12-04T14:40:36.8708207Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dc138e7c3d90d405.xml (deflated 86%)
2025-12-04T14:40:36.8709147Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11f1088c00e16c8c.xml (deflated 85%)
2025-12-04T14:40:36.8710134Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3523d5aaa7729d0c.xml (deflated 85%)
2025-12-04T14:40:36.8710971Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-70de31050b612090.xml (deflated 86%)
2025-12-04T14:40:36.8711895Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96a27193d0a2e839.xml (deflated 85%)
2025-12-04T14:40:36.8712827Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cc5a2675f46e34d3.xml (deflated 85%)
2025-12-04T14:40:36.8713689Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-37ac0a15b5eff353.xml (deflated 86%)
2025-12-04T14:40:36.8714536Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a5da48d7d65453d4.xml (deflated 85%)
2025-12-04T14:40:36.8715487Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6dba8e879764f929.xml (deflated 85%)
2025-12-04T14:40:36.8716336Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0d4d42e91b0ff091.xml (deflated 86%)
2025-12-04T14:40:36.8717356Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-659fbe1db9f9f989.xml (deflated 85%)
2025-12-04T14:40:36.8718876Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0f9da1e77120ab8a.xml (deflated 85%)
2025-12-04T14:40:36.8719957Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-91b8af2bf22e5dbf.xml (deflated 86%)
2025-12-04T14:40:36.8720951Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6902f75d647c91e7.xml (deflated 85%)
2025-12-04T14:40:36.8721836Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9bee3dd53feb5961.xml (deflated 85%)
2025-12-04T14:40:36.8722745Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-17bc86173edb9567.xml (deflated 86%)
2025-12-04T14:40:36.8723677Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-12cbce7f716a0669.xml (deflated 85%)
2025-12-04T14:40:36.8724562Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71532e1cbeaa1931.xml (deflated 85%)
2025-12-04T14:40:36.8725441Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1cbc5ac56a047f28.xml (deflated 86%)
2025-12-04T14:40:36.8726558Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a3505d51f13f273.xml (deflated 85%)
2025-12-04T14:40:36.8727553Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4bc023f248c82374.xml (deflated 85%)
2025-12-04T14:40:36.8728408Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-80114f319d6e3dd1.xml (deflated 86%)
2025-12-04T14:40:36.8729419Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c5d4e24682433b20.xml (deflated 85%)
2025-12-04T14:40:36.8730247Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1259359197037313.xml (deflated 85%)
2025-12-04T14:40:36.8731057Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50141705a26d91cc.xml (deflated 86%)
2025-12-04T14:40:36.8731907Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-032da2f374cad8bd.xml (deflated 85%)
2025-12-04T14:40:36.8732826Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0bdf2ccaad64a4e2.xml (deflated 85%)
2025-12-04T14:40:36.8733695Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f7b942a83386066d.xml (deflated 87%)
2025-12-04T14:40:36.8734594Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-96d9c65e819c8d75.xml (deflated 85%)
2025-12-04T14:40:36.8735456Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86387a3ec48a5612.xml (deflated 85%)
2025-12-04T14:40:36.8736313Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-86b90fcd4d18651c.xml (deflated 86%)
2025-12-04T14:40:36.8737244Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c3b1417ea80e2f0.xml (deflated 85%)
2025-12-04T14:40:36.8738119Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7815a5e2a911334a.xml (deflated 85%)
2025-12-04T14:40:36.8739033Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8773df7cdfc9f682.xml (deflated 87%)
2025-12-04T14:40:36.8739962Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f18b468d408a9813.xml (deflated 85%)
2025-12-04T14:40:36.8740795Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ada8ba8d71fda760.xml (deflated 85%)
2025-12-04T14:40:36.8741653Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6bbd4f6ab2b6130.xml (deflated 86%)
2025-12-04T14:40:36.8742582Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9751c3a5f583fd.xml (deflated 85%)
2025-12-04T14:40:36.8743518Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1e64a16331aaa14.xml (deflated 85%)
2025-12-04T14:40:36.8744352Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b30602e906f7649.xml (deflated 28%)
2025-12-04T14:40:36.8785775Z   adding: test/test-reports/python-pytest/inductor.test_compile_subprocess/inductor.test_compile_subprocess-2cb94ab29d3c0df8.xml (deflated 96%)
2025-12-04T14:40:36.8791604Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-ab5aa4d4069f84fb.xml (deflated 91%)
2025-12-04T14:40:36.8797427Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-35f77a6004c12574.xml (deflated 91%)
2025-12-04T14:40:36.8803170Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-4e86df63b0f31fa2.xml (deflated 91%)
2025-12-04T14:40:36.8809200Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-421be54fd2475226.xml (deflated 91%)
2025-12-04T14:40:36.8815074Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-269f711d29febed7.xml (deflated 91%)
2025-12-04T14:40:36.8873651Z   adding: test/test-reports/python-pytest/test_ops/test_ops-f35e359ea3f52347.xml (deflated 94%)
2025-12-04T14:40:36.8942041Z   adding: test/test-reports/python-pytest/test_ops/test_ops-7327fc5de50caef8.xml (deflated 95%)
2025-12-04T14:40:36.8970663Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-e6d2768dce09d0dd.xml (deflated 93%)
2025-12-04T14:40:36.8974903Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-52326583abfcb307.xml (deflated 93%)
2025-12-04T14:40:36.8978996Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-b0e5cfa73b17bf79.xml (deflated 93%)
2025-12-04T14:40:36.8983119Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-44c334397fb0c3bd.xml (deflated 93%)
2025-12-04T14:40:36.8988833Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3098e6f6c63481df.xml (deflated 90%)
2025-12-04T14:40:36.9020132Z   adding: test/test-reports/python-pytest/inductor.test_compiled_autograd/inductor.test_compiled_autograd-d8fc516c8be54fc6.xml (deflated 92%)
2025-12-04T14:40:36.9021101Z   adding: test/test-reports/python-pytest/inductor.test_layout_optim/inductor.test_layout_optim-ff0e0fc528f4f3dd.xml (deflated 27%)
2025-12-04T14:40:36.9035777Z   adding: test/test-reports/python-pytest/dynamo.test_exc/dynamo.test_exc-59dcc92175511a1b.xml (deflated 95%)
2025-12-04T14:40:36.9042496Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_arrayref/inductor.test_aot_inductor_arrayref-c35059cecd7c3b99.xml (deflated 93%)
2025-12-04T14:40:36.9043536Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-ccee7b90c33901e0.xml (deflated 86%)
2025-12-04T14:40:36.9044501Z   adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-4527efee43b2418d.xml (deflated 68%)
2025-12-04T14:40:36.9045536Z   adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-1bfa13dfa66ba37a.xml (deflated 68%)
2025-12-04T14:40:36.9046551Z   adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-d291715b5fb08603.xml (deflated 83%)
2025-12-04T14:40:36.9047498Z   adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-8f5f76cf24e3a322.xml (deflated 35%)
2025-12-04T14:40:36.9048378Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-dccd0f4af0dde98e.xml (deflated 92%)
2025-12-04T14:40:36.9049161Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-70b63fabd52069fd.xml (deflated 85%)
2025-12-04T14:40:36.9049943Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f8e73b1ecdff271.xml (deflated 85%)
2025-12-04T14:40:36.9052441Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b260dbfbe2039817.xml (deflated 94%)
2025-12-04T14:40:36.9053238Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6d60c9510a0f37ea.xml (deflated 72%)
2025-12-04T14:40:36.9059560Z   adding: test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3a4ab85c9fd562b7.xml (deflated 97%)
2025-12-04T14:40:36.9061061Z   adding: test/test-reports/python-pytest/inductor.test_flex_flash/inductor.test_flex_flash-594b02277dbafddb.xml (deflated 96%)
2025-12-04T14:40:36.9062170Z   adding: test/test-reports/python-pytest/inductor.test_segmented_tree/inductor.test_segmented_tree-3edcd06141f9e439.xml (deflated 79%)
2025-12-04T14:40:36.9063206Z   adding: test/test-reports/python-pytest/inductor.test_kernel_optimization/inductor.test_kernel_optimization-b711ca038cf9dded.xml (deflated 38%)
2025-12-04T14:40:36.9064006Z   adding: test/test-reports/python-pytest/inductor.test_metrics/inductor.test_metrics-b803ef0c4a9491e7.xml (deflated 69%)
2025-12-04T14:40:36.9064758Z   adding: test/test-reports/python-pytest/export.test_unflatten_training_ir/export.test_unflatten_training_ir-0b06d16f89271e02.xml (deflated 92%)
2025-12-04T14:40:36.9078666Z   adding: test/test-reports/python-pytest/inductor.test_triton_kernels/inductor.test_triton_kernels-31757cb9ac5c1c41.xml (deflated 94%)
2025-12-04T14:40:36.9079649Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_template/inductor.test_cutedsl_template-0fae1aaa4a2003eb.xml (deflated 88%)
2025-12-04T14:40:36.9080715Z   adding: test/test-reports/python-pytest/inductor.test_benchmark_fusion/inductor.test_benchmark_fusion-bc092756daa3cb92.xml (deflated 80%)
2025-12-04T14:40:36.9151443Z   adding: test/test-reports/python-pytest/export.test_serdes/export.test_serdes-707c3510a208c1b4.xml (deflated 95%)
2025-12-04T14:40:36.9152312Z   adding: test/test-reports/python-pytest/inductor.test_control_deps/inductor.test_control_deps-c5f7586c63bf6a73.xml (deflated 46%)
2025-12-04T14:40:36.9153241Z   adding: test/test-reports/python-pytest/inductor.test_benchmarking/inductor.test_benchmarking-a0634e57d5b5356a.xml (deflated 87%)
2025-12-04T14:40:36.9154194Z   adding: test/test-reports/python-pytest/inductor.test_helion_kernels/inductor.test_helion_kernels-31b2988c053252a6.xml (deflated 62%)
2025-12-04T14:40:36.9155132Z   adding: test/test-reports/python-pytest/inductor.test_quantization/inductor.test_quantization-84688fa1768a881f.xml (deflated 47%)
2025-12-04T14:40:36.9156046Z   adding: test/test-reports/python-pytest/inductor.test_best_config/inductor.test_best_config-423c961bf0014ead.xml (deflated 51%)
2025-12-04T14:40:36.9156887Z   adding: test/test-reports/python-pytest/export.test_tools/export.test_tools-21258e52645f11ac.xml (deflated 47%)
2025-12-04T14:40:36.9167377Z   adding: test/test-reports/python-pytest/inductor.test_compiled_optimizers/inductor.test_compiled_optimizers-bb39f909a1ebabd7.xml (deflated 96%)
2025-12-04T14:40:36.9169246Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor_custom_ops/inductor.test_aot_inductor_custom_ops-a7a529277f9f9a31.xml (deflated 93%)
2025-12-04T14:40:36.9181157Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-955a1f614439a238.xml (deflated 96%)
2025-12-04T14:40:36.9182391Z   adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-e1e65156309c3950.xml (deflated 85%)
2025-12-04T14:40:36.9183266Z   adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e4d33245d4b5500b.xml (deflated 89%)
2025-12-04T14:40:36.9185294Z   adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-212782fb09fba1b6.xml (deflated 86%)
2025-12-04T14:40:36.9186106Z   adding: test/test-reports/python-pytest/inductor.test_needs_exact_strides/inductor.test_needs_exact_strides-9df9f4b52deeb26d.xml (deflated 64%)
2025-12-04T14:40:36.9198146Z   adding: test/test-reports/python-pytest/inductor.test_auto_functionalize/inductor.test_auto_functionalize-2721b330ad87dcbb.xml (deflated 95%)
2025-12-04T14:40:36.9199054Z   adding: test/test-reports/python-pytest/dynamo.test_modes/dynamo.test_modes-84ffae5b03f7e325.xml (deflated 75%)
2025-12-04T14:40:36.9200031Z   adding: test/test-reports/python-pytest/inductor.test_custom_partitioner_fn/inductor.test_custom_partitioner_fn-7cdf0df133f39710.xml (deflated 49%)
2025-12-04T14:40:36.9200980Z   adding: test/test-reports/python-pytest/dynamo.test_debug_utils/dynamo.test_debug_utils-34b85abff78e1075.xml (deflated 62%)
2025-12-04T14:40:36.9201993Z   adding: test/test-reports/python-pytest/dynamo.test_base_hop/dynamo.test_base_hop-f16fc4276458e68a.xml (deflated 74%)
2025-12-04T14:40:36.9208320Z   adding: test/test-reports/python-pytest/dynamo.test_export/dynamo.test_export-b39b0ef66a188fdf.xml (deflated 87%)
2025-12-04T14:40:36.9209030Z   adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-89e5f4289c609732.xml (deflated 77%)
2025-12-04T14:40:36.9209832Z   adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-4cc8060b16634bb1.xml (deflated 93%)
2025-12-04T14:40:36.9211296Z   adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-9dcd937885307cef.xml (deflated 92%)
2025-12-04T14:40:36.9212033Z   adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-22b40053bc190597.xml (deflated 64%)
2025-12-04T14:40:36.9213424Z   adding: test/test-reports/python-pytest/dynamo.test_wrap_inductor_compiled_regions/dynamo.test_wrap_inductor_compiled_regions-e50f738759450405.xml (deflated 83%)
2025-12-04T14:40:36.9214805Z   adding: test/test-reports/python-pytest/dynamo.test_cudagraphs_expandable_segments/dynamo.test_cudagraphs_expandable_segments-6088bb8977cfc034.xml (deflated 85%)
2025-12-04T14:40:36.9217230Z   adding: test/test-reports/python-pytest/inductor.test_caching/inductor.test_caching-81c36c30e8c9f16c.xml (deflated 95%)
2025-12-04T14:40:36.9218555Z   adding: test/test-reports/python-pytest/dynamo.test_reorder_logs/dynamo.test_reorder_logs-8774cf5ade30b7b9.xml (deflated 85%)
2025-12-04T14:40:36.9222315Z   adding: test/test-reports/python-pytest/dynamo.test_subclasses/dynamo.test_subclasses-debad3a483737c49.xml (deflated 90%)
2025-12-04T14:40:36.9223304Z   adding: test/test-reports/python-pytest/dynamo.test_comptime/dynamo.test_comptime-47f6d1d4947f2a1a.xml (deflated 80%)
2025-12-04T14:40:36.9224237Z   adding: test/test-reports/python-pytest/test_privateuseone_python_backend/test_privateuseone_python_backend-af92c89cf1e734fe.xml (deflated 51%)
2025-12-04T14:40:36.9225219Z   adding: test/test-reports/python-pytest/functorch.test_rearrange/functorch.test_rearrange-2881d42f49f0d5f4.xml (deflated 77%)
2025-12-04T14:40:36.9226106Z   adding: test/test-reports/python-pytest/functorch.test_parsing/functorch.test_parsing-b19da695e97869b8.xml (deflated 77%)
2025-12-04T14:40:36.9226961Z   adding: test/test-reports/python-pytest/test_varlen_attention/test_varlen_attention-c77e9d85fce2d7de.xml (deflated 83%)
2025-12-04T14:40:36.9227764Z   adding: test/test-reports/python-pytest/test_mkl_verbose/test_mkl_verbose-aebaabaedf0418a4.xml (deflated 50%)
2025-12-04T14:40:36.9231557Z   adding: test/test-reports/python-pytest/test_cpp_api_parity/test_cpp_api_parity-b67630ba146aa7a3.xml (deflated 94%)
2025-12-04T14:40:36.9232323Z   adding: test/test-reports/python-pytest/test_autoload/test_autoload-f8ddaf02f0fba12a.xml (deflated 38%)
2025-12-04T14:40:36.9233178Z   adding: test/test-reports/python-pytest/nn.attention.test_open_registry/nn.attention.test_open_registry-090bcad7e8d69cda.xml (deflated 51%)
2025-12-04T14:40:36.9233951Z   adding: test/test-reports/python-pytest/xpu.test_fusion/xpu.test_fusion-5eb21ed31dcf28c7.xml (deflated 27%)
2025-12-04T14:40:36.9271245Z   adding: test/test-reports/python-pytest/test_foreach/test_foreach-08a78fa61936b219.xml (deflated 96%)
2025-12-04T14:40:36.9272386Z   adding: test/test-reports/python-pytest/test_pytree/test_pytree-863a0f55639901d8.xml (deflated 92%)
2025-12-04T14:40:36.9273189Z   adding: test/test-reports/python-pytest/test_namedtuple_return_api/test_namedtuple_return_api-8cf83ed9877ffdac.xml (deflated 72%)
2025-12-04T14:40:36.9274159Z   adding: test/test-reports/python-pytest/profiler.test_record_function/profiler.test_record_function-709377f3af2db71e.xml (deflated 74%)
2025-12-04T14:40:36.9275103Z   adding: test/test-reports/python-pytest/test_compile_benchmark_util/test_compile_benchmark_util-827ec108afd6e51f.xml (deflated 85%)
2025-12-04T14:40:36.9276255Z   adding: test/test-reports/python-pytest/test_set_default_mobile_cpu_allocator/test_set_default_mobile_cpu_allocator-a62b10a07a12c95d.xml (deflated 52%)
2025-12-04T14:40:36.9279690Z   adding: test/test-reports/python-pytest/test_fake_tensor/test_fake_tensor-aa30317e19bcd391.xml (deflated 90%)
2025-12-04T14:40:36.9396414Z   adding: test/test-reports/python-pytest/test_binary_ufuncs/test_binary_ufuncs-6264828c6395375f.xml (deflated 97%)
2025-12-04T14:40:36.9548755Z   adding: test/test-reports/python-pytest/test_meta/test_meta-be563441d2ad6907.xml (deflated 96%)
2025-12-04T14:40:36.9573048Z   adding: test/test-reports/python-pytest/test_fx/test_fx-26e481d16f3bec04.xml (deflated 95%)
2025-12-04T14:40:36.9599858Z   adding: test/test-reports/python-pytest/test_ops_gradients/test_ops_gradients-e4282282966a7ff9.xml (deflated 96%)
2025-12-04T14:40:36.9624197Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-59ab3583fe1f80dc.xml (deflated 98%)
2025-12-04T14:40:36.9634303Z   adding: test/test-reports/python-pytest/functorch.test_control_flow/functorch.test_control_flow-5ef9f5430d478fb5.xml (deflated 94%)
2025-12-04T14:40:36.9636845Z   adding: test/test-reports/python-pytest/complex_tensor.test_complex_tensor/complex_tensor.test_complex_tensor-b8215f419723e2db.xml (deflated 94%)
2025-12-04T14:40:36.9637986Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bedc5291ea06d7d4.xml (deflated 95%)
2025-12-04T14:40:36.9660370Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-7c0d16c38d6c5d66.xml (deflated 93%)
2025-12-04T14:40:36.9681926Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-53045e36c53c5e0b.xml (deflated 93%)
2025-12-04T14:40:36.9682890Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_getlimits/torch_np.numpy_tests.core.test_getlimits-3f363f609079497e.xml (deflated 86%)
2025-12-04T14:40:36.9686419Z   adding: test/test-reports/python-pytest/torch_np.test_ndarray_methods/torch_np.test_ndarray_methods-84ccb0941d397f92.xml (deflated 96%)
2025-12-04T14:40:36.9689731Z   adding: test/test-reports/python-pytest/test_view_ops/test_view_ops-7140a1ac93a67fd6.xml (deflated 92%)
2025-12-04T14:40:36.9735467Z   adding: test/test-reports/python-pytest/test_nn/test_nn-9b13ea9411b68db1.xml (deflated 97%)
2025-12-04T14:40:36.9736598Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_index_tricks/torch_np.numpy_tests.lib.test_index_tricks-6bd6911b2987aa11.xml (deflated 91%)
2025-12-04T14:40:36.9737627Z   adding: test/test-reports/python-pytest/test_jit_autocast/test_jit_autocast-a884db72e4413a95.xml (deflated 86%)
2025-12-04T14:40:36.9740100Z   adding: test/test-reports/python-pytest/nn.test_pooling/nn.test_pooling-21d4709389e5b8ee.xml (deflated 92%)
2025-12-04T14:40:36.9742506Z   adding: test/test-reports/python-pytest/nn.test_embedding/nn.test_embedding-ee3bcbb253c53d76.xml (deflated 95%)
2025-12-04T14:40:36.9743359Z   adding: test/test-reports/python-pytest/test_xnnpack_integration/test_xnnpack_integration-f847b013a309e4b4.xml (deflated 81%)
2025-12-04T14:40:36.9744053Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-d415fb392a79d6b5.xml (deflated 37%)
2025-12-04T14:40:36.9744645Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-020323916fe208b4.xml (deflated 35%)
2025-12-04T14:40:36.9745241Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-ec2ab5853a8f5bb5.xml (deflated 35%)
2025-12-04T14:40:36.9745840Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-f2344770aa1066fe.xml (deflated 35%)
2025-12-04T14:40:36.9746428Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-7974fe8ff8b4bbf1.xml (deflated 35%)
2025-12-04T14:40:36.9747011Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a89ca46644423e33.xml (deflated 35%)
2025-12-04T14:40:36.9747595Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-a24d5339b3f8a425.xml (deflated 35%)
2025-12-04T14:40:36.9748328Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-e1f683ecde7e9859.xml (deflated 36%)
2025-12-04T14:40:36.9748937Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-18c9d7ef1c10ca2f.xml (deflated 35%)
2025-12-04T14:40:36.9749524Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-c34005545b418b69.xml (deflated 35%)
2025-12-04T14:40:36.9750105Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-fef6b14731bb5c3b.xml (deflated 36%)
2025-12-04T14:40:36.9750795Z   adding: test/test-reports/python-pytest/test_cuda_trace/test_cuda_trace-9f269287574f2f5d.xml (deflated 35%)
2025-12-04T14:40:36.9751380Z   adding: test/test-reports/python-pytest/test_native_mha/test_native_mha-92c62998ce15c273.xml (deflated 93%)
2025-12-04T14:40:36.9752143Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numerictypes/torch_np.numpy_tests.core.test_numerictypes-794afd975a25dd08.xml (deflated 90%)
2025-12-04T14:40:36.9752962Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-264e490a42609567.xml (deflated 37%)
2025-12-04T14:40:36.9753674Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3424f861fbb3f30b.xml (deflated 36%)
2025-12-04T14:40:36.9754375Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-34f31a11d054a5e2.xml (deflated 37%)
2025-12-04T14:40:36.9755073Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-5cc2b0915fc09b1e.xml (deflated 36%)
2025-12-04T14:40:36.9755768Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8b636380fe175cf1.xml (deflated 36%)
2025-12-04T14:40:36.9756466Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3e5b65c914a199ce.xml (deflated 37%)
2025-12-04T14:40:36.9757159Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-df0d139575237859.xml (deflated 34%)
2025-12-04T14:40:36.9757875Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-3264b6f40acb9dc5.xml (deflated 36%)
2025-12-04T14:40:36.9758569Z   adding: test/test-reports/python-pytest/test_cuda_nvml_based_avail/test_cuda_nvml_based_avail-8f705997ba04d72c.xml (deflated 34%)
2025-12-04T14:40:36.9759242Z   adding: test/test-reports/python-pytest/test_function_schema/test_function_schema-23d7f60e4862c430.xml (deflated 82%)
2025-12-04T14:40:36.9759878Z   adding: test/test-reports/python-pytest/test_accelerator/test_accelerator-96bd402fc51988fa.xml (deflated 80%)
2025-12-04T14:40:36.9760574Z   adding: test/test-reports/python-pytest/nn.test_init/nn.test_init-4d2f3b79bbd3891e.xml (deflated 83%)
2025-12-04T14:40:36.9761326Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_scalar_methods/torch_np.numpy_tests.core.test_scalar_methods-ab020ac5345dfbce.xml (deflated 96%)
2025-12-04T14:40:36.9762209Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_helper/torch_np.numpy_tests.fft.test_helper-f56f2bb61bb011d4.xml (deflated 74%)
2025-12-04T14:40:36.9762962Z   adding: test/test-reports/python-pytest/test_mobile_optimizer/test_mobile_optimizer-6eaefe04adaaf056.xml (deflated 80%)
2025-12-04T14:40:36.9770791Z   adding: test/test-reports/python-pytest/test_overrides/test_overrides-54542bcdcb986158.xml (deflated 95%)
2025-12-04T14:40:36.9771511Z   adding: test/test-reports/python-pytest/torch_np.test_function_base/torch_np.test_function_base-75280ab4c9ebfe9c.xml (deflated 47%)
2025-12-04T14:40:36.9776631Z   adding: test/test-reports/python-pytest/test_type_promotion/test_type_promotion-263da3bcb01bdba5.xml (deflated 96%)
2025-12-04T14:40:36.9777362Z   adding: test/test-reports/python-pytest/torch_np.test_scalars_0D_arrays/torch_np.test_scalars_0D_arrays-0d870852e9b8ecce.xml (deflated 91%)
2025-12-04T14:40:36.9778166Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-ed390b613a87fd4b.xml (deflated 43%)
2025-12-04T14:40:36.9778839Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-30e4b3748ee506f0.xml (deflated 42%)
2025-12-04T14:40:36.9779494Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-a99d9031384e91f6.xml (deflated 36%)
2025-12-04T14:40:36.9780149Z   adding: test/test-reports/python-pytest/test_cuda_primary_ctx/test_cuda_primary_ctx-f9602eb3757a7925.xml (deflated 42%)
2025-12-04T14:40:36.9780916Z   adding: test/test-reports/python-pytest/profiler.test_profiler_tree/profiler.test_profiler_tree-3ad20a65a3b0a20b.xml (deflated 82%)
2025-12-04T14:40:36.9781740Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_arraysetops/torch_np.numpy_tests.lib.test_arraysetops-0f1d2c69ab44ce0e.xml (deflated 95%)
2025-12-04T14:40:36.9782728Z   adding: test/test-reports/python-pytest/test_dlpack/test_dlpack-41fa7f929a572602.xml (deflated 94%)
2025-12-04T14:40:36.9783599Z   adding: test/test-reports/python-pytest/profiler.test_torch_tidy/profiler.test_torch_tidy-6d6836cfdd083f06.xml (deflated 76%)
2025-12-04T14:40:36.9784277Z   adding: test/test-reports/python-pytest/lazy.test_reuse_ir/lazy.test_reuse_ir-927d5809a72b7fa4.xml (deflated 62%)
2025-12-04T14:40:36.9785006Z   adding: test/test-reports/python-pytest/test_functional_autograd_benchmark/test_functional_autograd_benchmark-0542bd395ed50334.xml (deflated 54%)
2025-12-04T14:40:36.9828935Z   adding: test/test-reports/python-pytest/test_reductions/test_reductions-113ccd6215a90199.xml (deflated 96%)
2025-12-04T14:40:36.9829769Z   adding: test/test-reports/python-unittest/test_autoload/TEST-TestDeviceBackendAutoload-20251204144025.xml (deflated 43%)
2025-12-04T14:40:36.9868889Z ##[group]Run # Remove any previous usage logs if they exist
2025-12-04T14:40:36.9869227Z [36;1m# Remove any previous usage logs if they exist[0m
2025-12-04T14:40:36.9869488Z [36;1mrm -f logs-*.zip[0m
2025-12-04T14:40:36.9869741Z [36;1mzip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true[0m
2025-12-04T14:40:36.9870102Z [36;1mzip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true[0m
2025-12-04T14:40:36.9877296Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:36.9877584Z env:
2025-12-04T14:40:36.9877738Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:36.9877930Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:36.9878173Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:36.9878568Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:36.9879070Z   FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T14:40:36.9879423Z ##[endgroup]
2025-12-04T14:40:36.9958330Z   adding: usage_log.txt (deflated 58%)
2025-12-04T14:40:37.0028670Z   adding: test/test-reports/inductor.test_aot_inductor_2.4_15a925ff16cb0669_.log (deflated 91%)
2025-12-04T14:40:37.0037365Z   adding: test/test-reports/dynamo.test_repros_1.1_21fd2d3c0d4dd552_.log (deflated 85%)
2025-12-04T14:40:37.0041567Z   adding: test/test-reports/inductor.test_flex_attention_2.6_90be3f66c016358d_.log (deflated 90%)
2025-12-04T14:40:37.0172892Z   adding: test/test-reports/inductor.test_cuda_select_algorithm_1.1_c5144f504c6801ae_.log (deflated 97%)
2025-12-04T14:40:37.0221042Z   adding: test/test-reports/inductor.test_compile_subprocess_2.2_b00c67e905398654_.log (deflated 95%)
2025-12-04T14:40:37.0230928Z   adding: test/test-reports/test_decomp_1.22_14e10bdd16255327_.log (deflated 89%)
2025-12-04T14:40:37.0240757Z   adding: test/test-reports/test_decomp_5.22_0df19fb5b56a60f0_.log (deflated 88%)
2025-12-04T14:40:37.0249956Z   adding: test/test-reports/test_decomp_10.22_12f8ad2c135e2bbb_.log (deflated 88%)
2025-12-04T14:40:37.0260064Z   adding: test/test-reports/test_decomp_15.22_3612d41b87c57e18_.log (deflated 88%)
2025-12-04T14:40:37.0269686Z   adding: test/test-reports/test_decomp_20.22_0285151ec6a3cff1_.log (deflated 88%)
2025-12-04T14:40:37.0350109Z   adding: test/test-reports/test_ops_3.9_cd041df637bc19f1_.log (deflated 92%)
2025-12-04T14:40:37.0429668Z   adding: test/test-reports/test_ops_8.9_44eee4e8e92c270e_.log (deflated 92%)
2025-12-04T14:40:37.0430302Z   adding: test/test-reports/test_cuda_primary_ctx_1.1_e42f702cb40c5a59_.log (deflated 85%)
2025-12-04T14:40:37.0446965Z   adding: test/test-reports/inductor.test_torchinductor_dynamic_shapes_4.4_d7a417aa701cd416_.log (deflated 92%)
2025-12-04T14:40:37.0456406Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_2.13_c074395000e8f728_.log (deflated 92%)
2025-12-04T14:40:37.0465143Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_7.13_206d55120439a46b_.log (deflated 93%)
2025-12-04T14:40:37.0465859Z   adding: test/test-reports/profiler.test_profiler_tree_1.1_8c9f11eb1f1e482c_.log (deflated 77%)
2025-12-04T14:40:37.0473382Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_12.13_b0b968134062f752_.log (deflated 91%)
2025-12-04T14:40:37.0475866Z   adding: test/test-reports/inductor.test_cuda_repro_1.1_1c12e5d6528c7a17_.log (deflated 84%)
2025-12-04T14:40:37.0490439Z   adding: test/test-reports/inductor.test_compiled_autograd_1.2_dcced55d4b6d289b_.log (deflated 90%)
2025-12-04T14:40:37.0491110Z   adding: test/test-reports/inductor.test_layout_optim_1.1_85312b2aa31c9171_.log (deflated 50%)
2025-12-04T14:40:37.0491710Z   adding: test/test-reports/dynamo.test_exc_1.1_e6b03d521cd15643_.log (deflated 72%)
2025-12-04T14:40:37.0498829Z   adding: test/test-reports/inductor.test_aot_inductor_arrayref_1.2_f8b4577ce160ed2e_.log (deflated 90%)
2025-12-04T14:40:37.0499361Z   adding: test/test-reports/inductor.test_halide_1.1_2220e392a986bf8c_.log (deflated 8%)
2025-12-04T14:40:37.0500055Z   adding: test/test-reports/inductor.test_deterministic_1.3_31baf2894918a3e4_.log (deflated 81%)
2025-12-04T14:40:37.0500755Z   adding: test/test-reports/dynamo.test_deque_reconstruct_1.1_5be42757c28bbc33_.log (deflated 63%)
2025-12-04T14:40:37.0501443Z   adding: test/test-reports/inductor.test_inductor_annotations_1.1_acd21ad590bd7056_.log (deflated 59%)
2025-12-04T14:40:37.0502133Z   adding: test/test-reports/inductor.test_compile_worker_1.1_22d83b26da38be9a_.log (deflated 76%)
2025-12-04T14:40:37.0502814Z   adding: test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_2cb42dcf29e7ede2_.log (deflated 53%)
2025-12-04T14:40:37.0514825Z   adding: test/test-reports/inductor.test_fp8_1.1_041887d0b8d7fee8_.log (deflated 94%)
2025-12-04T14:40:37.0516296Z   adding: test/test-reports/inductor.test_flex_flash_1.1_143acf3e8eed598e_.log (deflated 92%)
2025-12-04T14:40:37.0517288Z   adding: test/test-reports/inductor.test_segmented_tree_1.1_84df512657bb7938_.log (deflated 74%)
2025-12-04T14:40:37.0518273Z   adding: test/test-reports/inductor.test_kernel_optimization_1.1_55e099cc3f4c3e00_.log (deflated 54%)
2025-12-04T14:40:37.0518942Z   adding: test/test-reports/inductor.test_metrics_1.1_8516a4d7ea2a79fb_.log (deflated 64%)
2025-12-04T14:40:37.0520275Z   adding: test/test-reports/export.test_unflatten_training_ir_1.1_381cd1c16fe9e11b_.log (deflated 85%)
2025-12-04T14:40:37.0531134Z   adding: test/test-reports/inductor.test_triton_kernels_1.1_cd88835deb58b6d4_.log (deflated 92%)
2025-12-04T14:40:37.0531669Z   adding: test/test-reports/inductor.test_lookup_table_1.1_5735430f25b64d36_.log (deflated 6%)
2025-12-04T14:40:37.0532202Z   adding: test/test-reports/inductor.test_cutedsl_template_1.1_aa2751f33400ad4f_.log (deflated 77%)
2025-12-04T14:40:37.0532958Z   adding: test/test-reports/inductor.test_benchmark_fusion_1.1_e074574f7d298815_.log (deflated 75%)
2025-12-04T14:40:37.0569889Z   adding: test/test-reports/export.test_serdes_1.1_e8663fe68509169e_.log (deflated 91%)
2025-12-04T14:40:37.0570530Z   adding: test/test-reports/inductor.test_control_deps_1.1_8b7ff51ad1c7850a_.log (deflated 51%)
2025-12-04T14:40:37.0571171Z   adding: test/test-reports/inductor.test_benchmarking_1.1_829d05e5f8b311c4_.log (deflated 79%)
2025-12-04T14:40:37.0572004Z   adding: test/test-reports/inductor.test_helion_kernels_1.1_570e1b539a64331c_.log (deflated 57%)
2025-12-04T14:40:37.0572666Z   adding: test/test-reports/inductor.test_quantization_1.1_121915a11d71492f_.log (deflated 56%)
2025-12-04T14:40:37.0573305Z   adding: test/test-reports/inductor.test_best_config_1.1_4e63c9241f189413_.log (deflated 53%)
2025-12-04T14:40:37.0573796Z   adding: test/test-reports/export.test_tools_1.1_3c90f78b66d684d8_.log (deflated 63%)
2025-12-04T14:40:37.0580283Z   adding: test/test-reports/inductor.test_compiled_optimizers_1.3_97f7ba3c63654c1d_.log (deflated 92%)
2025-12-04T14:40:37.0581893Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_arraysetops_1.1_330f28fa6e8cce7d_.log (deflated 89%)
2025-12-04T14:40:37.0583445Z   adding: test/test-reports/inductor.test_aot_inductor_custom_ops_1.1_dddd6a5d20b7cc0a_.log (deflated 88%)
2025-12-04T14:40:37.0975805Z   adding: test/test-reports/inductor.test_control_flow_4.5_21e8d4f49de459e9_.log (deflated 97%)
2025-12-04T14:40:37.0976464Z   adding: test/test-reports/dynamo.test_cudagraphs_1.1_90173d668b3e025f_.log (deflated 68%)
2025-12-04T14:40:37.0977076Z   adding: test/test-reports/inductor.test_alignment_1.1_a164af8fc87f74b5_.log (deflated 73%)
2025-12-04T14:40:37.0978954Z   adding: test/test-reports/dynamo.test_guard_serialization_1.1_1b7b87cb0989e9eb_.log (deflated 84%)
2025-12-04T14:40:37.0979634Z   adding: test/test-reports/inductor.test_needs_exact_strides_1.1_5109866c57595440_.log (deflated 58%)
2025-12-04T14:40:37.0980488Z   adding: test/test-reports/inductor.test_auto_functionalize_1.1_68ce77f985508c50_.log (deflated 85%)
2025-12-04T14:40:37.0981470Z   adding: test/test-reports/dynamo.test_modes_1.1_8c4add47a33e36b7_.log (deflated 81%)
2025-12-04T14:40:37.0982122Z   adding: test/test-reports/inductor.test_custom_partitioner_fn_1.1_c3b6cd6a5d7fdcc1_.log (deflated 55%)
2025-12-04T14:40:37.0982788Z   adding: test/test-reports/dynamo.test_debug_utils_1.1_95c2c78decfe726b_.log (deflated 62%)
2025-12-04T14:40:37.0983393Z   adding: test/test-reports/dynamo.test_base_hop_1.1_b2087f66aa3d673c_.log (deflated 71%)
2025-12-04T14:40:37.0993356Z   adding: test/test-reports/dynamo.test_export_1.1_b7c3be727fa89598_.log (deflated 91%)
2025-12-04T14:40:37.0993975Z   adding: test/test-reports/dynamo.test_python_dispatcher_1.1_786a010f0f7c3940_.log (deflated 69%)
2025-12-04T14:40:37.0994585Z   adding: test/test-reports/export.test_swap_1.1_95c916183e2c0305_.log (deflated 78%)
2025-12-04T14:40:37.0995746Z   adding: test/test-reports/export.test_unflatten_1.1_d630949213564262_.log (deflated 78%)
2025-12-04T14:40:37.0996398Z   adding: test/test-reports/dynamo.test_verify_correctness_1.1_d2d881eebc4cfc16_.log (deflated 67%)
2025-12-04T14:40:37.0999491Z   adding: test/test-reports/test_dlpack_1.1_1ec0373c5e8d3363_.log (deflated 91%)
2025-12-04T14:40:37.1000576Z   adding: test/test-reports/dynamo.test_wrap_inductor_compiled_regions_1.1_3fbc3d993fa8b554_.log (deflated 81%)
2025-12-04T14:40:37.1001435Z   adding: test/test-reports/profiler.test_torch_tidy_1.1_59c0964fd7f7a67f_.log (deflated 76%)
2025-12-04T14:40:37.1002173Z   adding: test/test-reports/dynamo.test_cudagraphs_expandable_segments_1.1_461bf64d7b370157_.log (deflated 72%)
2025-12-04T14:40:37.1006420Z   adding: test/test-reports/inductor.test_caching_1.1_ee4444a2502ab159_.log (deflated 93%)
2025-12-04T14:40:37.1007134Z   adding: test/test-reports/dynamo.test_reorder_logs_1.1_bce7080c1339fb4f_.log (deflated 78%)
2025-12-04T14:40:37.1012167Z   adding: test/test-reports/dynamo.test_subclasses_1.1_0f5f75f18480ff60_.log (deflated 89%)
2025-12-04T14:40:37.1012712Z   adding: test/test-reports/dynamo.test_comptime_1.1_ee7f00d5f391ca6c_.log (deflated 71%)
2025-12-04T14:40:37.1013251Z   adding: test/test-reports/test_privateuseone_python_backend_1.1_d1fcfb50f1d5d34a_.log (deflated 58%)
2025-12-04T14:40:37.1013918Z   adding: test/test-reports/functorch.test_rearrange_1.1_f58f64fb207f9ea4_.log (deflated 71%)
2025-12-04T14:40:37.1014543Z   adding: test/test-reports/functorch.test_parsing_1.1_ab4f4097a5615c37_.log (deflated 73%)
2025-12-04T14:40:37.1015547Z   adding: test/test-reports/test_varlen_attention_1.1_7b4553019fa7c3b1_.log (deflated 85%)
2025-12-04T14:40:37.1016130Z   adding: test/test-reports/test_mkl_verbose_1.1_8c6cbee907023829_.log (deflated 54%)
2025-12-04T14:40:37.1027808Z   adding: test/test-reports/test_cpp_api_parity_1.1_c465bf1c09b2866a_.log (deflated 94%)
2025-12-04T14:40:37.1028387Z   adding: test/test-reports/test_autoload_1.1_7ada199d937afcb8_.log (deflated 50%)
2025-12-04T14:40:37.1028992Z   adding: test/test-reports/nn.attention.test_open_registry_1.1_7d49315786a6f063_.log (deflated 58%)
2025-12-04T14:40:37.1029780Z   adding: test/test-reports/xpu.test_fusion_1.1_008b7f4febfa87c6_.log (deflated 48%)
2025-12-04T14:40:37.1101857Z   adding: test/test-reports/test_foreach_1.1_76ff14b11afb6f5e_.log (deflated 95%)
2025-12-04T14:40:37.1104057Z   adding: test/test-reports/test_pytree_1.1_d855357a70a16b8f_.log (deflated 87%)
2025-12-04T14:40:37.1104651Z   adding: test/test-reports/test_namedtuple_return_api_1.1_7b0afa1d6fad9127_.log (deflated 60%)
2025-12-04T14:40:37.1105311Z   adding: test/test-reports/profiler.test_record_function_1.1_27a3a5f054eb4856_.log (deflated 70%)
2025-12-04T14:40:37.1105970Z   adding: test/test-reports/test_compile_benchmark_util_1.1_37f34d6b957df075_.log (deflated 53%)
2025-12-04T14:40:37.1106573Z   adding: test/test-reports/lazy.test_reuse_ir_1.1_27a380ba8ffdf251_.log (deflated 59%)
2025-12-04T14:40:37.1107205Z   adding: test/test-reports/test_set_default_mobile_cpu_allocator_1.1_d39ef648b820daa5_.log (deflated 59%)
2025-12-04T14:40:37.1113916Z   adding: test/test-reports/test_fake_tensor_1.1_e61dd7d666716729_.log (deflated 90%)
2025-12-04T14:40:37.1344171Z   adding: test/test-reports/test_binary_ufuncs_1.1_62e3371632b5f5b8_.log (deflated 96%)
2025-12-04T14:40:37.1549907Z   adding: test/test-reports/test_meta_2.4_5af3c72f6896ae42_.log (deflated 94%)
2025-12-04T14:40:37.1576795Z   adding: test/test-reports/test_fx_1.1_c9e7dc6459df6851_.log (deflated 92%)
2025-12-04T14:40:37.1607220Z   adding: test/test-reports/test_ops_gradients_2.4_85b1c0ac8f503e20_.log (deflated 93%)
2025-12-04T14:40:37.1619011Z   adding: test/test-reports/test_nestedtensor_3.4_dff7d83ca5d5d6c7_.log (deflated 91%)
2025-12-04T14:40:37.1787592Z   adding: test/test-reports/functorch.test_control_flow_4.4_38ed588b114098c9_.log (deflated 96%)
2025-12-04T14:40:37.1792634Z   adding: test/test-reports/complex_tensor.test_complex_tensor_3.3_18c49b70545b8444_.log (deflated 93%)
2025-12-04T14:40:37.1793324Z   adding: test/test-reports/optim.test_optim_1.1_96914e6399c32275_.log (deflated 7%)
2025-12-04T14:40:37.1794148Z   adding: test/test-reports/test_functional_autograd_benchmark_1.1_a799b1a1c12db472_.log (deflated 87%)
2025-12-04T14:40:37.1796355Z   adding: test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_de16653e9fff9a91_.log (deflated 90%)
2025-12-04T14:40:37.1821653Z   adding: test/test-reports/functorch.test_ops_1.9_c0eee613fccab590_.log (deflated 91%)
2025-12-04T14:40:37.1845930Z   adding: test/test-reports/functorch.test_ops_6.9_28c20a9b783bf56f_.log (deflated 91%)
2025-12-04T14:40:37.1846740Z   adding: test/test-reports/torch_np.numpy_tests.core.test_getlimits_1.1_82959dff790c271d_.log (deflated 77%)
2025-12-04T14:40:37.1853297Z   adding: test/test-reports/torch_np.test_ndarray_methods_1.1_6320f1f29b59eecd_.log (deflated 94%)
2025-12-04T14:40:37.1858481Z   adding: test/test-reports/test_view_ops_1.1_cb0863e2f971ce65_.log (deflated 91%)
2025-12-04T14:40:37.1917446Z   adding: test/test-reports/test_nn_1.1_a9c5853bd400a590_.log (deflated 95%)
2025-12-04T14:40:37.1995958Z   adding: test/test-reports/test_reductions_1.1_99f27656cb9f6489_.log (deflated 96%)
2025-12-04T14:40:37.1997175Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_a3265ffbea1c2963_.log (deflated 85%)
2025-12-04T14:40:37.1998648Z   adding: test/test-reports/test_jit_autocast_1.1_125de5112eb33739_.log (deflated 81%)
2025-12-04T14:40:37.2002462Z   adding: test/test-reports/nn.test_pooling_1.1_2fc4f4b87243805c_.log (deflated 90%)
2025-12-04T14:40:37.2006459Z   adding: test/test-reports/nn.test_embedding_1.1_0d2f816320a0d140_.log (deflated 93%)
2025-12-04T14:40:37.2007083Z   adding: test/test-reports/test_xnnpack_integration_1.1_cee22bfc3e04069b_.log (deflated 72%)
2025-12-04T14:40:37.2008197Z   adding: test/test-reports/test_cuda_trace_1.1_a09bfc4c91236df9_.log (deflated 92%)
2025-12-04T14:40:37.2010278Z   adding: test/test-reports/test_native_mha_1.1_0baa40d098e975c1_.log (deflated 93%)
2025-12-04T14:40:37.2011296Z   adding: test/test-reports/torch_np.numpy_tests.core.test_numerictypes_1.1_2b678d8fcdaea099_.log (deflated 86%)
2025-12-04T14:40:37.2012427Z   adding: test/test-reports/test_cuda_nvml_based_avail_1.1_f1988982a9fd6374_.log (deflated 92%)
2025-12-04T14:40:37.2013119Z   adding: test/test-reports/test_function_schema_1.1_77625ad433e579dc_.log (deflated 77%)
2025-12-04T14:40:37.2013764Z   adding: test/test-reports/test_accelerator_1.1_df51eee5e1970e59_.log (deflated 72%)
2025-12-04T14:40:37.2014671Z   adding: test/test-reports/nn.test_init_1.1_83fc7b147f0c612f_.log (deflated 78%)
2025-12-04T14:40:37.2015670Z   adding: test/test-reports/torch_np.test_scalars_0D_arrays_1.1_ed0b8e715c6c1568_.log (deflated 85%)
2025-12-04T14:40:37.2018005Z   adding: test/test-reports/torch_np.numpy_tests.core.test_scalar_methods_1.1_ab75aa6938d6d2cf_.log (deflated 92%)
2025-12-04T14:40:37.2018618Z   adding: test/test-reports/torch_np.numpy_tests.fft.test_helper_1.1_692efc716c4081b6_.log (deflated 69%)
2025-12-04T14:40:37.2019153Z   adding: test/test-reports/test_mobile_optimizer_1.1_9983f8f253c5b059_.log (deflated 67%)
2025-12-04T14:40:37.2041127Z   adding: test/test-reports/test_overrides_1.1_89ffca0310ffa68a_.log (deflated 93%)
2025-12-04T14:40:37.2041734Z   adding: test/test-reports/torch_np.test_function_base_1.1_500584d725d1f2e7_.log (deflated 55%)
2025-12-04T14:40:37.2050876Z   adding: test/test-reports/test_type_promotion_1.1_6db001c785be877d_.log (deflated 94%)
2025-12-04T14:40:37.2074690Z ##[group]Run # Remove any previous debugging artifacts if they exist
2025-12-04T14:40:37.2075063Z [36;1m# Remove any previous debugging artifacts if they exist[0m
2025-12-04T14:40:37.2075365Z [36;1mrm -f debug-*.zip[0m
2025-12-04T14:40:37.2075581Z [36;1mif [ -d 'test/debug' ]; then[0m
2025-12-04T14:40:37.2075837Z [36;1m  zip -r "debug-${FILE_SUFFIX}.zip" test/debug[0m
2025-12-04T14:40:37.2076091Z [36;1mfi[0m
2025-12-04T14:40:37.2083033Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:37.2083315Z env:
2025-12-04T14:40:37.2083474Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:37.2083672Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:37.2083899Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:37.2084295Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:37.2084792Z   FILE_SUFFIX: test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862
2025-12-04T14:40:37.2085160Z ##[endgroup]
2025-12-04T14:40:37.2161487Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T14:40:37.2161737Z with:
2025-12-04T14:40:37.2161922Z   s3-bucket: gha-artifacts
2025-12-04T14:40:37.2162164Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T14:40:37.2162421Z   retention-days: 14
2025-12-04T14:40:37.2162607Z   if-no-files-found: warn
2025-12-04T14:40:37.2162808Z   path: test-jsons-*.zip
2025-12-04T14:40:37.2162996Z   name: artifact
2025-12-04T14:40:37.2163169Z   region: us-east-1
2025-12-04T14:40:37.2163337Z env:
2025-12-04T14:40:37.2163494Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:37.2163694Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:37.2163924Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:37.2164321Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:37.2164735Z ##[endgroup]
2025-12-04T14:40:37.5394632Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T14:40:37.5395106Z With the provided path, there will be 1 file uploaded
2025-12-04T14:40:37.5395527Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact
2025-12-04T14:40:37.5450462Z Starting upload of test-jsons-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip
2025-12-04T14:40:37.7549405Z Finished upload of test-jsons-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip
2025-12-04T14:40:37.7779869Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T14:40:37.7780122Z with:
2025-12-04T14:40:37.7780299Z   s3-bucket: gha-artifacts
2025-12-04T14:40:37.7780550Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T14:40:37.7781026Z   retention-days: 14
2025-12-04T14:40:37.7781217Z   if-no-files-found: error
2025-12-04T14:40:37.7781424Z   path: test-reports-*.zip
2025-12-04T14:40:37.7781617Z   name: artifact
2025-12-04T14:40:37.7781787Z   region: us-east-1
2025-12-04T14:40:37.7781956Z env:
2025-12-04T14:40:37.7782125Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:37.7782324Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:37.7782561Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:37.7782972Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:37.7783339Z ##[endgroup]
2025-12-04T14:40:38.0706484Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T14:40:38.0706929Z With the provided path, there will be 1 file uploaded
2025-12-04T14:40:38.0707333Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact
2025-12-04T14:40:38.0760928Z Starting upload of test-reports-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip
2025-12-04T14:40:38.2669276Z Finished upload of test-reports-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip
2025-12-04T14:40:38.2901084Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T14:40:38.2901345Z with:
2025-12-04T14:40:38.2901523Z   s3-bucket: gha-artifacts
2025-12-04T14:40:38.2901773Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T14:40:38.2902034Z   retention-days: 14
2025-12-04T14:40:38.2902228Z   if-no-files-found: ignore
2025-12-04T14:40:38.2902438Z   path: logs-*.zip
2025-12-04T14:40:38.2902617Z   name: artifact
2025-12-04T14:40:38.2902789Z   region: us-east-1
2025-12-04T14:40:38.2902956Z env:
2025-12-04T14:40:38.2903118Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:38.2903314Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:38.2903561Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:38.2903958Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:38.2904341Z ##[endgroup]
2025-12-04T14:40:38.5845428Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T14:40:38.5845876Z With the provided path, there will be 1 file uploaded
2025-12-04T14:40:38.5846278Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact
2025-12-04T14:40:38.5899281Z Starting upload of logs-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip
2025-12-04T14:40:38.8064063Z Finished upload of logs-test-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu_57116084862.zip
2025-12-04T14:40:38.8297687Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T14:40:38.8297940Z with:
2025-12-04T14:40:38.8298117Z   s3-bucket: gha-artifacts
2025-12-04T14:40:38.8298364Z   s3-prefix: pytorch/pytorch/19922768520/1/artifact

2025-12-04T14:40:38.8298628Z   retention-days: 14
2025-12-04T14:40:38.8298833Z   if-no-files-found: ignore
2025-12-04T14:40:38.8299039Z   path: debug-*.zip
2025-12-04T14:40:38.8299210Z   name: artifact
2025-12-04T14:40:38.8299402Z   region: us-east-1
2025-12-04T14:40:38.8299571Z env:
2025-12-04T14:40:38.8299734Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:38.8299929Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:38.8300166Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:38.8300569Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:38.8300927Z ##[endgroup]
2025-12-04T14:40:39.1187023Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded.
2025-12-04T14:40:39.1426290Z ##[group]Run # shellcheck disable=SC2156
2025-12-04T14:40:39.1426603Z [36;1m# shellcheck disable=SC2156[0m
2025-12-04T14:40:39.1427035Z [36;1mfind . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \;[0m
2025-12-04T14:40:39.1434957Z shell: /usr/bin/bash -e {0}
2025-12-04T14:40:39.1435164Z env:
2025-12-04T14:40:39.1435320Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:39.1435519Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:39.1435928Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:39.1436325Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:39.1436678Z ##[endgroup]
2025-12-04T14:40:39.4888803Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a
2025-12-04T14:40:39.4889192Z with:
2025-12-04T14:40:39.4889481Z   name: coredumps-default-2-5-lf.linux.g6.4xlarge.experimental.nvidia.gpu
2025-12-04T14:40:39.4889816Z   retention-days: 14
2025-12-04T14:40:39.4890002Z   if-no-files-found: ignore
2025-12-04T14:40:39.4890212Z   path: ./**/core.[1-9]*
2025-12-04T14:40:39.4890402Z   s3-bucket: gha-artifacts
2025-12-04T14:40:39.4890592Z   region: us-east-1
2025-12-04T14:40:39.4890751Z env:
2025-12-04T14:40:39.4890893Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:39.4891074Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:39.4891298Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:39.4891707Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:39.4892073Z ##[endgroup]
2025-12-04T14:40:49.8824696Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded.
2025-12-04T14:40:49.9175357Z Prepare all required actions
2025-12-04T14:40:49.9175704Z Getting action download info
2025-12-04T14:40:50.0938430Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548)
2025-12-04T14:40:50.5079722Z ##[group]Run ./.github/actions/upload-utilization-stats
2025-12-04T14:40:50.5080106Z with:
2025-12-04T14:40:50.5080283Z   job_id: 57116084862
2025-12-04T14:40:50.5080714Z   job_name: linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 2, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check)
2025-12-04T14:40:50.5081175Z   workflow_name: trunk
2025-12-04T14:40:50.5081366Z   workflow_run_id: 19922768520
2025-12-04T14:40:50.5081586Z   workflow_attempt: 1
2025-12-04T14:40:50.5081791Z env:
2025-12-04T14:40:50.5081955Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:50.5082161Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:50.5082413Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:50.5082851Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:50.5083212Z ##[endgroup]
2025-12-04T14:40:50.5115575Z ##[group]Run actions/setup-python@v6
2025-12-04T14:40:50.5115813Z with:
2025-12-04T14:40:50.5115984Z   python-version: 3.10
2025-12-04T14:40:50.5116181Z   check-latest: false
2025-12-04T14:40:50.5116465Z   token: ***
2025-12-04T14:40:50.5116647Z   update-environment: true
2025-12-04T14:40:50.5116858Z   allow-prereleases: false
2025-12-04T14:40:50.5117359Z   freethreaded: false
2025-12-04T14:40:50.5117561Z env:
2025-12-04T14:40:50.5117724Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:50.5117927Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:50.5118172Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:50.5118594Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:50.5118959Z ##[endgroup]
2025-12-04T14:40:50.9027895Z ##[group]Installed versions
2025-12-04T14:40:50.9036308Z Version 3.10 was not found in the local cache
2025-12-04T14:40:50.9192358Z (node:305767) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
2025-12-04T14:40:50.9193085Z (Use `node --trace-deprecation ...` to show where the warning was created)
2025-12-04T14:40:51.2552450Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system.
The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json
2025-12-04T14:40:51.2747469Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2025-12-04T14:40:51.2747831Z with:
2025-12-04T14:40:51.2747976Z env:
2025-12-04T14:40:51.2748136Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:51.2748551Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:51.2748793Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:51.2749180Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:51.2749532Z ##[endgroup]
2025-12-04T14:40:51.2763212Z ##[group]Run set -eou pipefail
2025-12-04T14:40:51.2763447Z [36;1mset -eou pipefail[0m
2025-12-04T14:40:51.2763633Z [36;1m[0m
2025-12-04T14:40:51.2763909Z [36;1mecho "Holding runner for 2 hours until all ssh sessions have logged out"[0m
2025-12-04T14:40:51.2764337Z [36;1mfor _ in $(seq 1440); do[0m
2025-12-04T14:40:51.2764666Z [36;1m    # Break if no ssh session exists anymore[0m
2025-12-04T14:40:51.2764909Z [36;1m    if [ "$(who)" = "" ]; then[0m
2025-12-04T14:40:51.2765155Z [36;1m      break[0m
2025-12-04T14:40:51.2765314Z [36;1m    fi[0m
2025-12-04T14:40:51.2765474Z [36;1m    echo "."[0m
2025-12-04T14:40:51.2765651Z [36;1m    sleep 5[0m
2025-12-04T14:40:51.2765805Z [36;1mdone[0m
2025-12-04T14:40:51.2773350Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:51.2773631Z env:
2025-12-04T14:40:51.2773795Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:51.2773983Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:51.2774211Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:51.2774601Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:51.2774948Z ##[endgroup]
2025-12-04T14:40:51.2802118Z Holding runner for 2 hours until all ssh sessions have logged out
2025-12-04T14:40:51.2894626Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T14:40:51.2895027Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T14:40:51.2895338Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T14:40:51.2895590Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T14:40:51.2895844Z [36;1m# Prune all of the docker images[0m
2025-12-04T14:40:51.2896085Z [36;1mdocker system prune -af[0m
2025-12-04T14:40:51.2903189Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:40:51.2903471Z env:
2025-12-04T14:40:51.2903632Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:40:51.2903820Z   HAS_NVIDIA_GPU: true
2025-12-04T14:40:51.2904050Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:40:51.2904443Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:40:51.2904798Z ##[endgroup]
2025-12-04T14:41:02.4769499Z e29498c26bf7
2025-12-04T14:41:09.0091776Z Deleted Containers:
2025-12-04T14:41:09.0092203Z e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:41:09.0092518Z 
2025-12-04T14:41:20.3041143Z Deleted Images:
2025-12-04T14:41:20.3041595Z untagged: public.ecr.aws/docker/library/python:3.13
2025-12-04T14:41:20.3042275Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T14:41:20.3043030Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a
2025-12-04T14:41:20.3043833Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd
2025-12-04T14:41:20.3044404Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67
2025-12-04T14:41:20.3044976Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c
2025-12-04T14:41:20.3045559Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212
2025-12-04T14:41:20.3046054Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318
2025-12-04T14:41:20.3046477Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560
2025-12-04T14:41:20.3047206Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee
2025-12-04T14:41:20.3048066Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T14:41:20.3049158Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97
2025-12-04T14:41:20.3049780Z deleted: sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301
2025-12-04T14:41:20.3050218Z deleted: sha256:85a76b7bf29ad34eb76cce6f46af5d49a58b6272f80f983d5c769e82c7749301
2025-12-04T14:41:20.3050660Z deleted: sha256:0882f3ce59ff5ae30195ee4b059fc713e13eda107a3a7814a4616ac9058a30a4
2025-12-04T14:41:20.3051095Z deleted: sha256:64ba5b9344c11a3e4729136076830b90ac4cf1554046edb1bd4f0784b66ebd9b
2025-12-04T14:41:20.3051522Z deleted: sha256:88213c59cf461a65ab9b6cb07b4195dc9d41b5241c152daa002c7b3112e09124
2025-12-04T14:41:20.3051959Z deleted: sha256:4c0f83afa802ffbc05ebaf1aa50e48a2447c7c295549a6dded80ac63437906ca
2025-12-04T14:41:20.3052398Z deleted: sha256:6f7ec74460e8fb070c8209949095ea3be5f4e2fd69c9f750cd39ac4093f5e64b
2025-12-04T14:41:20.3052831Z deleted: sha256:d6928b0d1021b31942fdcb64e5eb4a34682de66e959dd424ed6ed02c29cd706d
2025-12-04T14:41:20.3053265Z deleted: sha256:4e9fbcb1705a6351bb34dd320558752614308636b94fd9ae6f26063e3deadc0a
2025-12-04T14:41:20.3053688Z deleted: sha256:43aabd0201f48712f21758071352dea029b4de37be08b2e2197706856a9ecbf2
2025-12-04T14:41:20.3054114Z deleted: sha256:940a98dec78303f0548beb1033242a45e9097607ef3e55c8b949b69b73d1b95e
2025-12-04T14:41:20.3054536Z deleted: sha256:d2849fa0e0411cf66e4408831d70e38838afb55b11a80c1c4d8aa0ae7dc9ca40
2025-12-04T14:41:20.3055014Z deleted: sha256:14f40d23c20c7e562623f89deb376520296758bc39dd3c77284049b84ebd8a31
2025-12-04T14:41:20.3055536Z deleted: sha256:a8ccba61f90ca097cb391d0f4fbed0d9f821d06b00e28f7332e9e2dcfcbac4ca
2025-12-04T14:41:20.3056138Z deleted: sha256:91b2060d290547d3b517d4a11d994bbe23f4560b5546cb91918ca1828dde6be1
2025-12-04T14:41:20.3056564Z deleted: sha256:b42a184755715dcfead7fad655a127433541d316d9628f5f730ff17ad5f8071c
2025-12-04T14:41:20.3056992Z deleted: sha256:aa5b4f3c9169061dc3c6da0e677e8a86f11ecb0a3f9fb4861ab3d8c04379775c
2025-12-04T14:41:20.3057436Z deleted: sha256:b4dcf450081a48d77fea0a21b8d810a69c03608a595e754fe7d365058d0579b7
2025-12-04T14:41:20.3057876Z deleted: sha256:4f7fe12d3d4f5bf890c7ada4ce16f17a105472aa6509a778f917dcce2f28174b
2025-12-04T14:41:20.3058310Z deleted: sha256:2d1d5a74182594f9a8553df00fdcfc809dba407bcd6700d667f862cbe9d555ce
2025-12-04T14:41:20.3058750Z deleted: sha256:d901e2f5d449aeed16b727bdcc11fc0e0f6c30c8fc5c39ac7eeac8a74d9d176c
2025-12-04T14:41:20.3059220Z deleted: sha256:a04df2603bd12372c6632469a9a81ebc4a8d677452c250672b9692884fa6a452
2025-12-04T14:41:20.3059644Z deleted: sha256:f438a6b52273a552dc3820a55c74c53a62a0eae9f2a7d21b37125add7d71639f
2025-12-04T14:41:20.3060073Z deleted: sha256:d4b09517e9518d709ac98b0ae6f8446ec9ac51688253607b1fca67aa2c87b3f4
2025-12-04T14:41:20.3060501Z deleted: sha256:c1fa38335237f5e7263e39d3d3de98215bcfbbb12b826955c02e149bf68efd13
2025-12-04T14:41:20.3060932Z deleted: sha256:c898d20a30de901fca74d7611663b17ab48e1726a11e031e40548ed16ee81877
2025-12-04T14:41:20.3061373Z deleted: sha256:3baceec7096518fcc10696feba551639d698b3145c2fc09cac927bb60c0fd751
2025-12-04T14:41:20.3061805Z deleted: sha256:5245aaaa3d5c3a19f76b9a6c920bd82d1a0ff5289f87c8c109652089709d9b3b
2025-12-04T14:41:20.3062231Z deleted: sha256:f05cc789b95246938c377f474c41187965b89ceac0250e7d5124bec32153f447
2025-12-04T14:41:20.3062668Z deleted: sha256:07ec4fc008de4e7a2c794ec7094cc72e0d287c04c8b2156163aee0bae147fe2d
2025-12-04T14:41:20.3063103Z deleted: sha256:c6302601ad5fde573c1f8c900250478fca7fdc6907d8fd4fae651b94b4d9264d
2025-12-04T14:41:20.3063534Z deleted: sha256:cc5e955ee1dc54931f02606c5ea87aae14f03b5d764492be611480ab041f2882
2025-12-04T14:41:20.3063965Z deleted: sha256:f21c03518996d98452338f4e80bcfd9b139a1dab155f4830be0d3f623035269f
2025-12-04T14:41:20.3064523Z deleted: sha256:519ca6f1279f7886f25f0005527cfa627deebbc5b7d7cdbfa7ef962bcfc4c26d
2025-12-04T14:41:20.3064960Z deleted: sha256:0ef990495216807d0175b192045be3f617e72331bc373b3434807f41bf69168d
2025-12-04T14:41:20.3065381Z deleted: sha256:7093edf7319e1f0e01654c3224e32c8dede5b948d106e0b9b03cbf0bb1091e33
2025-12-04T14:41:20.3065890Z deleted: sha256:c478161e058e2f4041555c3e880b95ee1ee047938dc58549a3a88135740996ae
2025-12-04T14:41:20.3066311Z deleted: sha256:9bb853b0d938cd7c36a80ce8ee40653f2c0ff92719209b11beb03acc8855ce3e
2025-12-04T14:41:20.3066738Z deleted: sha256:fdf2ace71a78ce6910ef9c4b073c195531da47022443b606bb92dcd6499b6afc
2025-12-04T14:41:20.3067193Z deleted: sha256:576c2b3770d871937d3cfb7014328bcb4bd1aed0c28bc438764b3bfdac4c1ac2
2025-12-04T14:41:20.3067624Z deleted: sha256:878e92b9cb82de09ac14a9d5f3f7bc2411a799b6f54d0d64b78c2bb4d1fdc0fc
2025-12-04T14:41:20.3068053Z deleted: sha256:85c8c3b98b65a6695f988a10cc66c981d73a3ef03eda15b8e14d227b50b56300
2025-12-04T14:41:20.3068488Z deleted: sha256:ce2ab3ba07794f9ee95d6ea7de6dcd3d2aed96561f9a79192dd56ca5bf29313a
2025-12-04T14:41:20.3068922Z deleted: sha256:37a6e12976ca957286977e696e63012ab9821214b0483fe1a48d29dcb280508a
2025-12-04T14:41:20.3069352Z deleted: sha256:cd1d5d3dd7038144ca6fe961c0d4c8e705625ae0c36190ba8b3e9602abedad19
2025-12-04T14:41:20.3069782Z deleted: sha256:0e707276e0be2e0008b86d594fadc0d16444d66c4fb7227c56f144cbb3c2affd
2025-12-04T14:41:20.3070214Z deleted: sha256:22d4aad6a2ada91b341c1225a0f314042b8aeabef7568c5c019709b058bf070b
2025-12-04T14:41:20.3070655Z deleted: sha256:ee4adacf4e0933131d0275eddad406b3c8147e6cf07a292b99f1aff4b5355f33
2025-12-04T14:41:20.3071088Z deleted: sha256:43da0b9e7c0e18403dcb834e53628dc7c970ccb2dbd091878c0d7c0170dbc97f
2025-12-04T14:41:20.3071515Z deleted: sha256:00571684bdcd75beda15eb7d4e79b5458bc914350f9bb4d87fcdc97ad15e0da1
2025-12-04T14:41:20.3071957Z deleted: sha256:41615f09950259f1d75e82ef35b6fc53b18fe71ebff143744cfd51009d04349e
2025-12-04T14:41:20.3072403Z deleted: sha256:75ab34d2eed3c7915467a506ab6dab2711918fbabe94add2fb5c62780221ab0c
2025-12-04T14:41:20.3072853Z deleted: sha256:0a39ef2bebf44c1c3893d1e5fb42dad48b8fac7ca673141267ee967f85455e89
2025-12-04T14:41:20.3073287Z deleted: sha256:9b7d024e48ba1f9824a54597621b1b062cbc4aa41a77d81ca538d6b5c24a612c
2025-12-04T14:41:20.3073732Z deleted: sha256:392257172de6434c271bd93394218a91e9aa86d7c18abc2f2759317b9d5fb6de
2025-12-04T14:41:20.3074155Z deleted: sha256:6c3232860b930866a463a356124fc392c7e5f04895695229257e8c3e8a02711d
2025-12-04T14:41:20.3074572Z deleted: sha256:63dd55b807215e2fa6c715419ac0c5072d02dddc848dbf74bb7e77b906b5eaed
2025-12-04T14:41:20.3075014Z deleted: sha256:07a8738c1b4584db72ed9aa60f5274321eb0ba16263450da3a75df8326ebc25f
2025-12-04T14:41:20.3075439Z deleted: sha256:053fe2965b01281d12040ec1893e0d1aa77362a49ea9a1067402272c69dad9f5
2025-12-04T14:41:20.3075869Z deleted: sha256:7857fb5eb181c4e80262ecab60bdd3c266cf3d1409ceb76c05882609b416a8d3
2025-12-04T14:41:20.3076297Z deleted: sha256:752528477fc99089de3bd2c6da7b30cf34f2e901fe06d8fcfe685b411461e883
2025-12-04T14:41:20.3076734Z deleted: sha256:cce0210e2f4b042601813df03aa294a86b0c668fcfc75f4c63f6fa12b2952e15
2025-12-04T14:41:20.3077166Z deleted: sha256:f2bb405a26705ecd12d21380d26d9355d01db3a2175080fbdb468f2b5a25a76c
2025-12-04T14:41:20.3077603Z deleted: sha256:ad430120d4ffbaf97cd8d6de6ea8eefa4a8f80ec45f0b176c6b26bff0970fd33
2025-12-04T14:41:20.3078054Z deleted: sha256:225a4910baea7cc540ed43eeac75046293800ab0b8e0192b51e991c8cb50bcf3
2025-12-04T14:41:20.3078490Z deleted: sha256:a259945b0c3507f049fbac10fb3d3ffe43d45e83c91b80ae8cd1dafb855ad83c
2025-12-04T14:41:20.3078922Z deleted: sha256:862a98881b1d5adad5c21d01602773b894794097de80964ef8f47bcaadb43255
2025-12-04T14:41:20.3079343Z deleted: sha256:1cf6d3c8b6c2694b79a2d08719594903811c330a36a4c7a8a7153a350b53d292
2025-12-04T14:41:20.3079773Z deleted: sha256:232a1ae8b0fee817ff7838bb5986a2f38377d3b1dbbf5217b576af0f953b0844
2025-12-04T14:41:20.3080333Z deleted: sha256:c72c5705dabd6314423dd7d4fb260a20d5d9886b2ebce60d19e9d78c4a2335c2
2025-12-04T14:41:20.3080857Z deleted: sha256:296734cf81fd92c913884d058908598424ffe072676e38de289bbab83768c7bd
2025-12-04T14:41:20.3081274Z deleted: sha256:7c76040481b889847a1804021aeff07547eaa4ee706d6137db218d497a8fd9c1
2025-12-04T14:41:20.3081707Z deleted: sha256:d5e293f5b354e8cbcc6de893ea72cc632b02d8fdfbb08ec3127c4e9662f3ebff
2025-12-04T14:41:20.3082222Z deleted: sha256:f35a64e429c88e249645090f21fbe7dae108d98e0ab4ea13184f24b3fd66c315
2025-12-04T14:41:20.3082815Z deleted: sha256:ce6ae8d595c8e69115c51b1ce4f9a9158795d7b863b1cb53f21c39a87974d41b
2025-12-04T14:41:20.3083257Z deleted: sha256:8941abaee59400fb9b3a60765fea4a1fc2a6a447467a6d983e84c7f72494a450
2025-12-04T14:41:20.3083696Z deleted: sha256:ef53c29a9a2c2bc80ffdb9bfaf92842436b5755ec1ce828b9d11e5e27d656ea1
2025-12-04T14:41:20.3084134Z deleted: sha256:7a347fb0acb43f1c814f8c8ff21185e8b5cf64d7bc5988cea060f77d906e08b5
2025-12-04T14:41:20.3084562Z deleted: sha256:cc855dc9be79496e15175569dced2d13477e50b077a5fd3945f9bf50018880c1
2025-12-04T14:41:20.3084996Z deleted: sha256:f7a9946ada3d4786658bc0b643808bb32a9a45e4e90e30dc43ee19e2dbe24024
2025-12-04T14:41:20.3085435Z deleted: sha256:c22a9215f62812c1d2e32827f5221ff556c5b6702aadbdab6b87b8293f19635e
2025-12-04T14:41:20.3085853Z deleted: sha256:959a56746620012e37c1def1a83c5afb1e7c0adc59b021a28beb53c24df98032
2025-12-04T14:41:20.3086286Z deleted: sha256:31a0fff0695bf6100c17954be72eab2095b466d559c75c3faf2a17d8c41e6ebe
2025-12-04T14:41:20.3086722Z deleted: sha256:c15e2b5241b9e55af1b2593e544391b4b44d0505e6528e8f12425136e93b424c
2025-12-04T14:41:20.3087150Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b
2025-12-04T14:41:20.3087402Z 
2025-12-04T14:41:20.3087489Z Total reclaimed space: 40.14GB
2025-12-04T14:41:20.3135778Z ##[group]Run set +e
2025-12-04T14:41:20.3136045Z [36;1mset +e[0m
2025-12-04T14:41:20.3136233Z [36;1mset -x[0m
2025-12-04T14:41:20.3136395Z [36;1m[0m
2025-12-04T14:41:20.3136549Z [36;1mnvidia-smi[0m
2025-12-04T14:41:20.3136893Z [36;1m# NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in[0m
2025-12-04T14:41:20.3137381Z [36;1m# the case where the driver has already crashed as it still can get the driver version[0m
2025-12-04T14:41:20.3137840Z [36;1m# and some basic information like the bus ID.  However, the rest of the information[0m
2025-12-04T14:41:20.3138210Z [36;1m# would be missing (ERR!), for example:[0m
2025-12-04T14:41:20.3138444Z [36;1m#[0m
2025-12-04T14:41:20.3138666Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T14:41:20.3139042Z [36;1m# | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |[0m
2025-12-04T14:41:20.3139441Z [36;1m# |-------------------------------+----------------------+----------------------+[0m
2025-12-04T14:41:20.3139820Z [36;1m# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |[0m
2025-12-04T14:41:20.3140220Z [36;1m# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |[0m
2025-12-04T14:41:20.3140561Z [36;1m# |                               |                      |               MIG M. |[0m
2025-12-04T14:41:20.3140821Z [36;1m# |===============================+======================+======================|[0m
2025-12-04T14:41:20.3141107Z [36;1m# |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |[0m
2025-12-04T14:41:20.3141450Z [36;1m# |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |[0m
2025-12-04T14:41:20.3141767Z [36;1m# |                               |                      |                 ERR! |[0m
2025-12-04T14:41:20.3142067Z [36;1m# +-------------------------------+----------------------+----------------------+[0m
2025-12-04T14:41:20.3142323Z [36;1m#[0m
2025-12-04T14:41:20.3142534Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T14:41:20.3142848Z [36;1m# | Processes:                                                                  |[0m
2025-12-04T14:41:20.3143185Z [36;1m# |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |[0m
2025-12-04T14:41:20.3143491Z [36;1m# |        ID   ID                                                   Usage      |[0m
2025-12-04T14:41:20.3143748Z [36;1m# |=============================================================================|[0m
2025-12-04T14:41:20.3144206Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T14:41:20.3144463Z [36;1m#[0m
2025-12-04T14:41:20.3144725Z [36;1m# This should be reported as a failure instead as it will guarantee to fail when[0m
2025-12-04T14:41:20.3145096Z [36;1m# Docker tries to run with --gpus all[0m
2025-12-04T14:41:20.3145324Z [36;1m#[0m
2025-12-04T14:41:20.3145576Z [36;1m# So, the correct check here is to query one of the missing piece of info like[0m
2025-12-04T14:41:20.3145932Z [36;1m# GPU name, so that the command can fail accordingly[0m
2025-12-04T14:41:20.3146286Z [36;1mnvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0[0m
2025-12-04T14:41:20.3146596Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T14:41:20.3146775Z [36;1m[0m
2025-12-04T14:41:20.3147076Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T14:41:20.3147532Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T14:41:20.3147952Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T14:41:20.3148298Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T14:41:20.3148525Z [36;1mfi[0m
2025-12-04T14:41:20.3148673Z [36;1m[0m
2025-12-04T14:41:20.3149057Z [36;1m# For runner with multiple GPUs, we also want to confirm that the number of GPUs are the[0m
2025-12-04T14:41:20.3149498Z [36;1m# power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails[0m
2025-12-04T14:41:20.3149876Z [36;1m# https://github.com/pytorch/test-infra/issues/4000[0m
2025-12-04T14:41:20.3150178Z [36;1mGPU_COUNT=$(nvidia-smi --list-gpus | wc -l)[0m
2025-12-04T14:41:20.3150429Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T14:41:20.3150616Z [36;1m[0m
2025-12-04T14:41:20.3150917Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T14:41:20.3151370Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T14:41:20.3151765Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T14:41:20.3152110Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T14:41:20.3152339Z [36;1mfi[0m
2025-12-04T14:41:20.3152491Z [36;1m[0m
2025-12-04T14:41:20.3152667Z [36;1m# Check the GPU count to be a power of 2[0m
2025-12-04T14:41:20.3153054Z [36;1mif [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then[0m
2025-12-04T14:41:20.3153575Z [36;1m  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..."[0m
2025-12-04T14:41:20.3153970Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T14:41:20.3154198Z [36;1mfi[0m
2025-12-04T14:41:20.3164453Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:41:20.3164739Z env:
2025-12-04T14:41:20.3164903Z   GIT_DEFAULT_BRANCH: main
2025-12-04T14:41:20.3165109Z   HAS_NVIDIA_GPU: true
2025-12-04T14:41:20.3165339Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T14:41:20.3165729Z   DOCKER_CONTAINER_ID: e29498c26bf7fe811b8c0d2a8327214fa8f0c3ca096f47f829d3f281406f9c82
2025-12-04T14:41:20.3166081Z ##[endgroup]
2025-12-04T14:41:20.3197184Z + nvidia-smi
2025-12-04T14:41:20.3408628Z Thu Dec  4 14:41:20 2025       
2025-12-04T14:41:20.3409000Z +-----------------------------------------------------------------------------------------+
2025-12-04T14:41:20.3409495Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T14:41:20.3409944Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T14:41:20.3410402Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T14:41:20.3410889Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T14:41:20.3411612Z |                                         |                        |               MIG M. |
2025-12-04T14:41:20.3412123Z |=========================================+========================+======================|
2025-12-04T14:41:20.3578849Z |   0  NVIDIA L4                      On  |   00000000:35:00.0 Off |                    0 |
2025-12-04T14:41:20.3579296Z | N/A   31C    P8             12W /   72W |       0MiB /  23034MiB |      0%      Default |
2025-12-04T14:41:20.3579654Z |                                         |                        |                  N/A |
2025-12-04T14:41:20.3580023Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T14:41:20.3582378Z 
2025-12-04T14:41:20.3582562Z +-----------------------------------------------------------------------------------------+
2025-12-04T14:41:20.3582969Z | Processes:                                                                              |
2025-12-04T14:41:20.3583419Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T14:41:20.3583803Z |        ID   ID                                                               Usage      |
2025-12-04T14:41:20.3584357Z |=========================================================================================|
2025-12-04T14:41:20.3587441Z |  No running processes found                                                             |
2025-12-04T14:41:20.3587792Z +-----------------------------------------------------------------------------------------+
2025-12-04T14:41:20.5903758Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T14:41:20.6090073Z NVIDIA L4
2025-12-04T14:41:20.6145367Z + NVIDIA_SMI_STATUS=0
2025-12-04T14:41:20.6145620Z + '[' 0 -ne 0 ']'
2025-12-04T14:41:20.6151618Z ++ nvidia-smi --list-gpus
2025-12-04T14:41:20.6152806Z ++ wc -l
2025-12-04T14:41:20.6394553Z + GPU_COUNT=1
2025-12-04T14:41:20.6394800Z + NVIDIA_SMI_STATUS=0
2025-12-04T14:41:20.6395023Z + '[' 0 -ne 0 ']'
2025-12-04T14:41:20.6395364Z + '[' 1 -le 8 ']'
2025-12-04T14:41:20.6395550Z + '[' 1 -ne 1 ']'
2025-12-04T14:41:20.6461036Z Post job cleanup.
2025-12-04T14:41:20.6520001Z Post job cleanup.
2025-12-04T14:41:20.6556067Z Post job cleanup.
2025-12-04T14:41:20.7506688Z [command]/usr/bin/git version
2025-12-04T14:41:20.7546542Z git version 2.50.1
2025-12-04T14:41:20.7579971Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/590277cf-288d-4b27-bbc2-e34b5fb61d2c/.gitconfig'
2025-12-04T14:41:20.7589622Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/590277cf-288d-4b27-bbc2-e34b5fb61d2c' before making global git config changes
2025-12-04T14:41:20.7590564Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T14:41:20.7594886Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T14:41:20.7639792Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T14:41:20.7677908Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T14:41:20.8017741Z Entering 'android/libs/fbjni'
2025-12-04T14:41:20.8083346Z Entering 'third_party/FP16'
2025-12-04T14:41:20.8148324Z Entering 'third_party/FXdiv'
2025-12-04T14:41:20.8213231Z Entering 'third_party/NNPACK'
2025-12-04T14:41:20.8276165Z Entering 'third_party/NVTX'
2025-12-04T14:41:20.8343240Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:41:20.8408015Z Entering 'third_party/XNNPACK'
2025-12-04T14:41:20.8487661Z Entering 'third_party/aiter'
2025-12-04T14:41:20.8553630Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:41:20.8627911Z Entering 'third_party/benchmark'
2025-12-04T14:41:20.8693598Z Entering 'third_party/composable_kernel'
2025-12-04T14:41:20.8766654Z Entering 'third_party/cpp-httplib'
2025-12-04T14:41:20.8832748Z Entering 'third_party/cpuinfo'
2025-12-04T14:41:20.8897654Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:41:20.8963973Z Entering 'third_party/cutlass'
2025-12-04T14:41:20.9040736Z Entering 'third_party/fbgemm'
2025-12-04T14:41:20.9105607Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:41:20.9171517Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:41:20.9242133Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:41:20.9306394Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:41:20.9379393Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:41:20.9443691Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:41:20.9506134Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:41:20.9571742Z Entering 'third_party/flash-attention'
2025-12-04T14:41:20.9634777Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:41:20.9705170Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:41:20.9781584Z Entering 'third_party/flatbuffers'
2025-12-04T14:41:20.9850178Z Entering 'third_party/fmt'
2025-12-04T14:41:20.9914462Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:41:20.9979463Z Entering 'third_party/gloo'
2025-12-04T14:41:21.0044600Z Entering 'third_party/googletest'
2025-12-04T14:41:21.0111043Z Entering 'third_party/ideep'
2025-12-04T14:41:21.0179645Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:41:21.0251844Z Entering 'third_party/ittapi'
2025-12-04T14:41:21.0319753Z Entering 'third_party/kineto'
2025-12-04T14:41:21.0383114Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:41:21.0445108Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:41:21.0508854Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:41:21.0573887Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:41:21.0637683Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:41:21.0701393Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:41:21.0774712Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:41:21.0838163Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:41:21.0903716Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:41:21.0968273Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:41:21.1033230Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:41:21.1095698Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:41:21.1160822Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:41:21.1230770Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:41:21.1293778Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:41:21.1358298Z Entering 'third_party/kleidiai'
2025-12-04T14:41:21.1424205Z Entering 'third_party/mimalloc'
2025-12-04T14:41:21.1496535Z Entering 'third_party/nlohmann'
2025-12-04T14:41:21.1568491Z Entering 'third_party/onnx'
2025-12-04T14:41:21.1647874Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:41:21.1717147Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:41:21.1783203Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:41:21.1847201Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:41:21.1915337Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:41:21.1984399Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:41:21.2057914Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:41:21.2127103Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:41:21.2190739Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:41:21.2251915Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:41:21.2321020Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:41:21.2385174Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:41:21.2476727Z Entering 'third_party/pocketfft'
2025-12-04T14:41:21.2542458Z Entering 'third_party/protobuf'
2025-12-04T14:41:21.2608960Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:41:21.2673495Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:41:21.2740336Z Entering 'third_party/psimd'
2025-12-04T14:41:21.2811738Z Entering 'third_party/pthreadpool'
2025-12-04T14:41:21.2876834Z Entering 'third_party/pybind11'
2025-12-04T14:41:21.2943660Z Entering 'third_party/python-peachpy'
2025-12-04T14:41:21.3006939Z Entering 'third_party/sleef'
2025-12-04T14:41:21.3080292Z Entering 'third_party/tensorpipe'
2025-12-04T14:41:21.3142773Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:41:21.3222205Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:41:21.3285947Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:41:21.3357908Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:41:21.3427982Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:41:21.3523259Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T14:41:21.3545617Z http.https://github.com/.extraheader
2025-12-04T14:41:21.3554982Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2025-12-04T14:41:21.3587824Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T14:41:21.3919157Z Entering 'android/libs/fbjni'
2025-12-04T14:41:21.3962922Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4004272Z Entering 'third_party/FP16'
2025-12-04T14:41:21.4045104Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4085874Z Entering 'third_party/FXdiv'
2025-12-04T14:41:21.4130154Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4172291Z Entering 'third_party/NNPACK'
2025-12-04T14:41:21.4213595Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4259391Z Entering 'third_party/NVTX'
2025-12-04T14:41:21.4301535Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4355274Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:41:21.4402665Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4444199Z Entering 'third_party/XNNPACK'
2025-12-04T14:41:21.4491712Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4552047Z Entering 'third_party/aiter'
2025-12-04T14:41:21.4594779Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4635580Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:41:21.4681879Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4738680Z Entering 'third_party/benchmark'
2025-12-04T14:41:21.4783409Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4824320Z Entering 'third_party/composable_kernel'
2025-12-04T14:41:21.4868144Z http.https://github.com/.extraheader
2025-12-04T14:41:21.4914936Z Entering 'third_party/cpp-httplib'
2025-12-04T14:41:21.4965732Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5006493Z Entering 'third_party/cpuinfo'
2025-12-04T14:41:21.5047037Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5087195Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:41:21.5130756Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5172772Z Entering 'third_party/cutlass'
2025-12-04T14:41:21.5215701Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5265807Z Entering 'third_party/fbgemm'
2025-12-04T14:41:21.5309529Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5357934Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:41:21.5399760Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5439322Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:41:21.5480769Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5532458Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:41:21.5575831Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5616609Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:41:21.5655676Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5704156Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:41:21.5753021Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5793733Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:41:21.5835546Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5874910Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:41:21.5922529Z http.https://github.com/.extraheader
2025-12-04T14:41:21.5967157Z Entering 'third_party/flash-attention'
2025-12-04T14:41:21.6010664Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6051161Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:41:21.6092296Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6139678Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:41:21.6180991Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6233337Z Entering 'third_party/flatbuffers'
2025-12-04T14:41:21.6275363Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6318963Z Entering 'third_party/fmt'
2025-12-04T14:41:21.6361157Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6403699Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:41:21.6447772Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6487831Z Entering 'third_party/gloo'
2025-12-04T14:41:21.6531256Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6580700Z Entering 'third_party/googletest'
2025-12-04T14:41:21.6623305Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6664629Z Entering 'third_party/ideep'
2025-12-04T14:41:21.6710664Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6755590Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:41:21.6798299Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6846069Z Entering 'third_party/ittapi'
2025-12-04T14:41:21.6889304Z http.https://github.com/.extraheader
2025-12-04T14:41:21.6929004Z Entering 'third_party/kineto'
2025-12-04T14:41:21.6971144Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7014758Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:41:21.7056317Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7095815Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:41:21.7137282Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7178737Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:41:21.7222083Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7264641Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:41:21.7314284Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7361750Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:41:21.7404628Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7444052Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:41:21.7486172Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7530803Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:41:21.7572227Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7618110Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:41:21.7660399Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7703128Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:41:21.7745828Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7787365Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:41:21.7830311Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7877149Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:41:21.7923558Z http.https://github.com/.extraheader
2025-12-04T14:41:21.7963693Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:41:21.8004511Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8048646Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:41:21.8090746Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8144217Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:41:21.8191566Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8233346Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:41:21.8275206Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8321507Z Entering 'third_party/kleidiai'
2025-12-04T14:41:21.8368997Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8409138Z Entering 'third_party/mimalloc'
2025-12-04T14:41:21.8450730Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8492425Z Entering 'third_party/nlohmann'
2025-12-04T14:41:21.8535338Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8583084Z Entering 'third_party/onnx'
2025-12-04T14:41:21.8625680Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8680181Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:41:21.8722389Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8770786Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:41:21.8811865Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8853864Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:41:21.8895575Z http.https://github.com/.extraheader
2025-12-04T14:41:21.8935034Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:41:21.8981301Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9030743Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:41:21.9071842Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9112741Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:41:21.9152625Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9198796Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:41:21.9240653Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9288842Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:41:21.9330794Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9375445Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:41:21.9416102Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9454247Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:41:21.9495578Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9537831Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:41:21.9580087Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9621331Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:41:21.9662235Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9724807Z Entering 'third_party/pocketfft'
2025-12-04T14:41:21.9772317Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9818240Z Entering 'third_party/protobuf'
2025-12-04T14:41:21.9860677Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9901293Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:41:21.9942546Z http.https://github.com/.extraheader
2025-12-04T14:41:21.9986672Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:41:22.0030585Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0074736Z Entering 'third_party/psimd'
2025-12-04T14:41:22.0119311Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0164751Z Entering 'third_party/pthreadpool'
2025-12-04T14:41:22.0208427Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0248037Z Entering 'third_party/pybind11'
2025-12-04T14:41:22.0290500Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0337626Z Entering 'third_party/python-peachpy'
2025-12-04T14:41:22.0379936Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0419832Z Entering 'third_party/sleef'
2025-12-04T14:41:22.0459894Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0499810Z Entering 'third_party/tensorpipe'
2025-12-04T14:41:22.0541008Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0580987Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:41:22.0623174Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0663665Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:41:22.0705136Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0745662Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:41:22.0794329Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0839165Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:41:22.0880507Z http.https://github.com/.extraheader
2025-12-04T14:41:22.0916902Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:41:22.0962894Z http.https://github.com/.extraheader
2025-12-04T14:41:22.1030689Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.1062640Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T14:41:22.1407145Z Entering 'android/libs/fbjni'
2025-12-04T14:41:22.1434008Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T14:41:22.1454351Z Entering 'third_party/FP16'
2025-12-04T14:41:22.1488091Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T14:41:22.1507593Z Entering 'third_party/FXdiv'
2025-12-04T14:41:22.1534867Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T14:41:22.1555464Z Entering 'third_party/NNPACK'
2025-12-04T14:41:22.1588441Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T14:41:22.1608382Z Entering 'third_party/NVTX'
2025-12-04T14:41:22.1634978Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T14:41:22.1656301Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T14:41:22.1684788Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T14:41:22.1705342Z Entering 'third_party/XNNPACK'
2025-12-04T14:41:22.1733888Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T14:41:22.1768253Z Entering 'third_party/aiter'
2025-12-04T14:41:22.1796402Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T14:41:22.1816418Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T14:41:22.1845129Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T14:41:22.1873967Z Entering 'third_party/benchmark'
2025-12-04T14:41:22.1902968Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:41:22.1924594Z Entering 'third_party/composable_kernel'
2025-12-04T14:41:22.1953154Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T14:41:22.1981916Z Entering 'third_party/cpp-httplib'
2025-12-04T14:41:22.2014997Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T14:41:22.2035610Z Entering 'third_party/cpuinfo'
2025-12-04T14:41:22.2064081Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T14:41:22.2085134Z Entering 'third_party/cudnn_frontend'
2025-12-04T14:41:22.2113767Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T14:41:22.2135971Z Entering 'third_party/cutlass'
2025-12-04T14:41:22.2164313Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T14:41:22.2193841Z Entering 'third_party/fbgemm'
2025-12-04T14:41:22.2223347Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T14:41:22.2245415Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T14:41:22.2272697Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T14:41:22.2292701Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T14:41:22.2322102Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T14:41:22.2349487Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T14:41:22.2376745Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T14:41:22.2396218Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T14:41:22.2425202Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T14:41:22.2453548Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T14:41:22.2481445Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T14:41:22.2499986Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T14:41:22.2527170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T14:41:22.2546460Z Entering 'third_party/fbgemm/external/json'
2025-12-04T14:41:22.2574093Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T14:41:22.2600356Z Entering 'third_party/flash-attention'
2025-12-04T14:41:22.2630576Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T14:41:22.2649785Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T14:41:22.2676712Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T14:41:22.2701547Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T14:41:22.2728922Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T14:41:22.2759183Z Entering 'third_party/flatbuffers'
2025-12-04T14:41:22.2790005Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T14:41:22.2814085Z Entering 'third_party/fmt'
2025-12-04T14:41:22.2843506Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T14:41:22.2864169Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T14:41:22.2893128Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T14:41:22.2918760Z Entering 'third_party/gloo'
2025-12-04T14:41:22.2945847Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T14:41:22.2966774Z Entering 'third_party/googletest'
2025-12-04T14:41:22.2995086Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:41:22.3015856Z Entering 'third_party/ideep'
2025-12-04T14:41:22.3044877Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T14:41:22.3063035Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T14:41:22.3098277Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T14:41:22.3126202Z Entering 'third_party/ittapi'
2025-12-04T14:41:22.3154268Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T14:41:22.3174859Z Entering 'third_party/kineto'
2025-12-04T14:41:22.3210994Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T14:41:22.3229696Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T14:41:22.3257231Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T14:41:22.3275395Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T14:41:22.3303780Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T14:41:22.3325529Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T14:41:22.3353423Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T14:41:22.3373649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T14:41:22.3402080Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T14:41:22.3422981Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T14:41:22.3451011Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T14:41:22.3469996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T14:41:22.3497459Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T14:41:22.3520277Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T14:41:22.3546910Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T14:41:22.3566939Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T14:41:22.3594611Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:41:22.3615323Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T14:41:22.3651775Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T14:41:22.3672760Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T14:41:22.3701528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T14:41:22.3726174Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T14:41:22.3753411Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T14:41:22.3772383Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:41:22.3801848Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T14:41:22.3823008Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:41:22.3851133Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T14:41:22.3876705Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T14:41:22.3904910Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T14:41:22.3926304Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T14:41:22.3953432Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T14:41:22.3976460Z Entering 'third_party/kleidiai'
2025-12-04T14:41:22.4005122Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T14:41:22.4027064Z Entering 'third_party/mimalloc'
2025-12-04T14:41:22.4054546Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T14:41:22.4074410Z Entering 'third_party/nlohmann'
2025-12-04T14:41:22.4102617Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T14:41:22.4133095Z Entering 'third_party/onnx'
2025-12-04T14:41:22.4160920Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T14:41:22.4193926Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T14:41:22.4223193Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:41:22.4246935Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T14:41:22.4275076Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T14:41:22.4295597Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T14:41:22.4323921Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:41:22.4343290Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T14:41:22.4371443Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:41:22.4391765Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T14:41:22.4421016Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T14:41:22.4438929Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T14:41:22.4466077Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T14:41:22.4486593Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T14:41:22.4514323Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T14:41:22.4534918Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T14:41:22.4566274Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T14:41:22.4585203Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T14:41:22.4620474Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T14:41:22.4636881Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T14:41:22.4664490Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T14:41:22.4685907Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T14:41:22.4713868Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T14:41:22.4738466Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T14:41:22.4771552Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T14:41:22.4810628Z Entering 'third_party/pocketfft'
2025-12-04T14:41:22.4839017Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T14:41:22.4857742Z Entering 'third_party/protobuf'
2025-12-04T14:41:22.4886085Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T14:41:22.4907893Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T14:41:22.4935805Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T14:41:22.4954819Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T14:41:22.4990328Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:41:22.5012052Z Entering 'third_party/psimd'
2025-12-04T14:41:22.5040281Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T14:41:22.5059000Z Entering 'third_party/pthreadpool'
2025-12-04T14:41:22.5087024Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T14:41:22.5106666Z Entering 'third_party/pybind11'
2025-12-04T14:41:22.5135593Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:41:22.5156122Z Entering 'third_party/python-peachpy'
2025-12-04T14:41:22.5183767Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T14:41:22.5204233Z Entering 'third_party/sleef'
2025-12-04T14:41:22.5233125Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T14:41:22.5254034Z Entering 'third_party/tensorpipe'
2025-12-04T14:41:22.5283021Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T14:41:22.5302474Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T14:41:22.5331816Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T14:41:22.5351996Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T14:41:22.5380432Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T14:41:22.5400635Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T14:41:22.5428075Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T14:41:22.5447606Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T14:41:22.5474884Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T14:41:22.5495114Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T14:41:22.5522959Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T14:41:22.5573220Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5602058Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5628991Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5654986Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5681120Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5707333Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5734181Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5760027Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5784955Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5819522Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5837962Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5864941Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5891389Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5918161Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5942842Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5968311Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.5993043Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6019360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6044655Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6070318Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6095229Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6121771Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6147543Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6171524Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6196892Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6221524Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6246532Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6270414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6296705Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6321439Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6346253Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6372097Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6397552Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6422845Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6448402Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6473973Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6499741Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6526294Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6550681Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6577197Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6603114Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6629462Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6657418Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6683105Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6709249Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6734663Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6760956Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6791486Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6816879Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6841759Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6867501Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6892528Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6917741Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6941698Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6966997Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.6991455Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7018539Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7042804Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7067442Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7092969Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7118647Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7142595Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7169202Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7193163Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7218477Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7247991Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7272606Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7298147Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7326220Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7350891Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7374814Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7399620Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7427536Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7451993Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7477076Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7503291Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7529272Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7555043Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7580290Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7608801Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7633893Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T14:41:22.7750219Z A job completed hook has been configured by the self-hosted runner administrator
2025-12-04T14:41:22.7766218Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh'
2025-12-04T14:41:22.7773043Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T14:41:22.7773330Z ##[endgroup]
2025-12-04T14:41:29.8031856Z Cleaning up orphan processes